Server failures in october and november 2017

The huge downtime at OVH that occurred on November 9th 2017 was quite like an earthquake for the European web. Of course Piwigo.com was impacted. But before that, we lived the server failure of October 7th and another one on October 14th. Let’s describe and explain what happened.

Photo by Johannes Plenio on Unsplash

Photo by Johannes Plenio on Unsplash

A) October 7th, the first server failure

On October 7th 2017, during saturday evening, our “reverse-proxy” server, the one through which all web traffic goes, crashed. OVH, our technical host, has identified a problem on the motherboard and replaced it. Web traffic was routed to the spare server during the short downtime. A server failure without real gravity, without loss of data, but which announced the start of a painful series of technical problems.

B) October 14th, a more serious server failure

A week later, on October 14th, the very same “reverse-proxy” server saw his load go into such high levels it was unable to deliver web pages… Web traffic is again switched to the spare server, in read-only mode for accounts hosted on this server. About 10 hours of investigation later, we were still not able to understand the origin of the problem. We have to decide to switch the spare server to write mode. This decision was difficult to take because it meant losing data produced between the last backup (1am) and the switch to spare server (about 8am). In other words, for the accounts hosted on this server, the photos added during the night simply “disappeared” from their Piwigo.

This is the first time in the history of Piwigo.com that we switch a spare server to write mode. Unfortunately, another problem has happened, related to the first one. To explain this problem, it is necessary to understand how Piwigo.com servers infrastructure works.

On the Piwigo.com infrastructure, servers work in pairs: a main server and its spare server. There are currently 4 pairs in production. The main server takes care of the “live operations”, while the spare server is synchronized with its main server every night and receives the web traffic in read-only during downtimes.

In the usual way, spare servers only allow read operations, ie you can visit the albums or view the photos, but not enter the administration or add photos.

One of the server pairs is what we call the “reverse-proxy”: all the web traffic of *.piwigo.com goes through this server and according to the piwigo concerned, the traffic goes to one or the other pair. Normally the reverse-proxy is configured to point to the main servers, not spare servers.

When a problem occurs on one of the main servers, we switch the traffic to its spare server. If the reverse-proxy server is concerned, we switch the IP address Fail-Over (IPFO): a mechanism that we manage on our OVH administration pannel. For other servers, we change the reverse-proxy configuration.

That’s enough for infrastructure details… let’s go back to October 14th: so we switched the IPFO to use the spare reverse-proxy server. Unfortunately, we met 2 problems in cascade:

  1. the spare reverse-proxy server, for one of the server pairs, pointed to the spare server
  2. this very spare server was configured in write mode instead of read-only

Why such an unexpected configuration?

Because we sometimes use the spare infrastructure to do real-life tests. In this case, these were IPV6 tests.

What impact for users?

During the many hours when the web traffic went through the spare reverse-proxy server, accounts hosted on the faulty server returned to the state of the previous night where photos added during night & morning had apparently disappeared but they were able to keep adding photos. This state did not trigger any specific alert : the situation seemed “normal” for the users concerned and for our monitor system. When the problem was detected, we changed the reverse proxy configuration to point back to the main server. Consequence: all the photos added during the downtime apparently disappeared.

What actions have been taken after October 14th?

1) Checks on reverse-proxy configuration

A new script was pushed on production. It checks very often that reverse-proxy is configured to send web traffic on main servers only.

2) Checks on write Vs read-only mode

Another script was pushed to production. This one checks main servers are configured in write mode and spare severs are in read-only mode.

3) Isolate third-party web applications

The “non-vital” web applications, on which we have less expertise, were switched to a third-party server dedicated to this use: 2 WordPress blogs, wiki, forum and piwik (analytics for visits). Indeed, one of the possibilities for the server failure, is that an application entered the 4th dimension or was under attack. Moving these applications into an “isolated” server helps to limit the impact of any future issue.

4) New backup system

The decision to switch a spare server to write mode, ie turn it into a main server, is a hard to take. Indeed it means giving up any hope to return to the main server. This decision is difficult because it involves accepting a loss of data.

To make this decision simpler, two measures have been taken: first to define a time threshold after which we apply the switch. In our case, if the failure lasts more than 2 hours, we will switch. Then backups must be more frequent than once a day: if the backups were only 1 or 2 hours old, the decision would have been much easier!

In addition to the daily backup, we have added a new “rolling backups” system: every 15 minutes, the script analyzes each Piwigo on specific criteria (new/modified/deleted photos/users/albums/groups…). If anything has changed since the last backup, the script backs up the Piwigo (files + database) with a synchronization on the spare server.

C) What about the giant downtime on OVH network, on October 9th and 10th ?

Being hosted at OVH, especially in the datacenter of Strasbourg (France, Europe), the downtime has greatly impacted our own infrastructure. First because our main reverse-proxy server is in Strasbourg. The datacenter failure put Piwigo.com completely out of order during the morning of November 9th (Central Europe time). Then because we could not switch the IP Fail Over. Or rather, OVH allowed us to do it, but instead of requiring ~60 seconds, it took ~10 hours! Hours when the accounts hosted on the reverse-proxy server were in read-only.

Unlike the October 14th situation, we could not make the decision to switch the spare server in write mode because an IPFO switch request was in progress, and we had no idea how long it would take OVH to apply the action.

The Piwigo.com infrastructure has returned to its normal state on November 10th at 14:46, Paris time (France).

OVH has just provided compensation for these failures. We were waiting for it to publish this blog post. The compensation is not much, compared to the actual damage, but we will fully transfer this compensation to our customers. After very high level calculations, 3 days of time credits were added to each account. It’s a small commercial gesture but we think we have to reverse it to you as a symbol!

We are sorry for these inconveniences. As you read in this blog post, we’ve improved our methods to mitigate risk in the future and reduce the impact of an irreversible server failure.

Posted in General | Leave a comment

All Piwigo.com accounts updated to version 2.9

17 days after Piwigo 2.9.0 was released and 4 days after we started to update Piwigo.com, all accounts are now up-to-date.

Piwigo 2.9 and new design on administration pages

Piwigo 2.9 and new design on administration pages

As you will learn from the release notes, your history will now be automatically purged to keep “only” the last 1 million lines. Yes, some of you, 176 to be exact, have more than 1 million lines, with a record set to 27 millions lines!

Posted in General | Comments Off on All Piwigo.com accounts updated to version 2.9

Maintenance report of April 28th 2017

Piwigo.com clients have already received this message. Many users told us they were happy to receive such details about our technical operations so but let’s make it more “public” with a blog post!

A. The short version

On April 27th 2017, we replaced one of Piwigo.com main servers. The replacement itself was successful. No downtime. The read-only mode has lasted only 7 minutes, from 6:00 to 6:07 UTC.

While sending the notification email to our clients, we encountered difficulties with Gmail users. Solving this Gmail issue made the website unavailable for a few users and maybe an hour. Everything was back to normal in a few hours. Of course, no data has been lost during this operation.

The new server and Piwigo are now good friends. They both look forward to receive version 2.9 in the next days 😉

B. Additional technical details

The notification message had already been sent to the first 390 users when we realized emails sent to Gmail addresses were returned in error. Indeed Gmail now asks for a “reverse DNS IPv6”. Sorry for this very technical detail. We already had it on the old server so we added it on the new server. And then start the problems… Unfortunately the new server does not manage IPv6 the same way. A few users, on IPv6, told us they only see “Apache2 Debian Default Page” instead of their Piwigo. Here is the timeline:

  • 6:00 the upgrade starts, switch to read-only mode
  • 6:07 web traffic redirected to the new server
  • 7:40 the new server is doing fine, we start to notify users
  • 7:52 Gmail errors, we stop the notification
  • 8:31 we add an IPv6 on the new server
  • 10:05 following a few users feedback, we remove the IPv6

Unfortunately adding or removing an IPv6 is not an immediate action. It relies on the “DNS propagation” which may take a few hours, depending on each user.

We took the rest of the day to figure out how to make Gmail accept our emails and web visitors see your Piwigo. Instead of “piwigo.com”, we now use a sub-domain of “pigolabs.com” (Pigolabs is the company running Piwigo.com service) with an IPv6 : no impact on web traffic.

We also have a technical solution to handle IPv6 for web traffic. We have decided not to use it because IPv6 lacks an important feature, the FailOver. This feature, only available on IPv4, let us redirect web traffic from one server to another in a few seconds without worrying about DNS propagation. We use it when a server fails and web traffic goes to a spare server.

In the end, the move did not go so well and we sweat quite a this friday, but everything came back to normal and the “Apache2 Debian Default Page” issue eventually affected only a few people!

Posted in General | Comments Off on Maintenance report of April 28th 2017

Piwigo.com Enterprise plans, now official!

In the shadow of the standard plan for several years and yet already adopted by more than 50 organizations, it is time to officially introduce the Piwigo.com Enterprise plans. They were designed for organizations, private or public, looking for a simple, affordable and yet complete tool to manage their collection of photos.

The main idea behind Piwigo.com Enterprise is to democratize photo library management for organizations of all kind and size. We are not targeting fortune 500, although some of them are already clients, but fortune 5,000,000 companies!

Piwigo.com Enterprise plans can replace, at a reasonable cost, inadequate solutions relying on intranet shared folders, where photos are sometimes duplicated, deleted by mistake, without the appropriate permission system.

Introduction to Piwigo.com Enterprise plans

Introduction to Piwigo.com Enterprise plans

Why announcing officially these plans today? Because the current trend obviously shows us that our Enterprise plans find its market. Although semi-official, Enterprise plans represented nearly 40% of our revenue in February 2017! It is time to put these plans under the spotlights.

In practice, here is what changes with the Piwigo.com Enterprise plans:

  1. they can be used by organizations, as opposed to the standard plan
  2. additional features, such as support for non-photo files (PDF, videos …)
  3. higher level of service (priority support, customization, presentation session)

Discover Piwigo.com Entreprise

Posted in General | Comments Off on Piwigo.com Enterprise plans, now official!

HTTPS is live on Piwigo.com

Some of you were waiting for it, others don’t know yet what it’s all about!

HTTPS is the way to encrypt communications between your web browser and the website you visit. Your Piwigo for instance. It is mainly useful for the log in form and administration pages. Your password is no longer sent in “plain text” through internet nodes, like your internet provider or Piwigo.com servers.

SSL certificate in action for HTTPS

SSL certificate in action for HTTPS

How to use it?

For now, Piwigo doesn’t automatically use HTTPS. You have to switch manually if you want HTTPS. Just add “s” after “http” in the address bar of your web browser.

In the next few days or weeks, Piwigo will automatically switch to HTTPS on the login form and the pages you open afterwards.

Why wasn’t HTTPS already available?

Piwigo.com was born 6 years ago and HTTPS already existed at that time. Here are the 3 main reasons for the wait:

  1. Piwigo is a photo management software, not a bank. Such a level of security level was not considered as a priority, compared to other features.
  2. the Piwigo application and its related project, without considering Piwigo.com hosting, have needed some code changes to work flawlessly with HTTPS. Today we’re proud to say Piwigo works great with multiple addresses, with or without HTTPS. Piwigo automatically uses the appropriate web address. If you have worked with other web application, you certainly know how much Piwigo makes your life easy when dealing with URLs.
  3. the multiple servers infrastructure on Piwigo.com, with multiple sub-domains *.piwigo.com have made the whole encryption system a bit complex. Without going into details, and for those of you interested, we use a wildcard SSL certificate from Gandi. Nginx reverse proxy on frontend server runs on it. So does Nginx on backend servers. All communication between Piwigo.com servers is encrypted when you use HTTPS.

What about custom domain names?

11.5% of Piwigo.com accounts are using a custom domain name. They have more than a *.piwigo.com web address.

Each SSL certificate, which is the “key” for encryption, is dedicated to a domain name. In this case, our SSL certificate is only “trusted” for *.piwigo.com.

You can try to use your domain name with HTTPS, but your web browser will display a huge security warning. If you say to your web browser “it’s OK, I understand the risk”, then you can use our certificate combined to your domain name.

The obvious solution is to use Let’s Encrypt, recently released. It will let us generate custom certificates, perfectly compliant with web browser requirements. We will work on it.

Posted in Feature | Tagged | 2 Comments

Referral program, reloaded

Spending money on expensive advertising campaigns to recruit new customers, for Piwigo.com, is an unpredictable return on investment. Because Piwigo.com relies on the satisfaction of its existing customer base, as being the most important selling point, it has been decided to spend our advertising budget on rewarding our existing customer base for introducing new customers in the shape of their friends and colleagues.

To this end a referrer who successfully introduces a new customer, who actually subscribes, will have their reward increased from a free one month extension of their subscription to a free six month extension. In other words just find two new customers for Piwigo.com and you earn a full year for free. The new user has the incentive that instead of receiving 13 months for the price of 12 they will receive 14 months.

New convenient feature: you can easily copy your referral code or signup link

New convenient feature: you can easily copy your referral code or signup link

To start with the referral program, open your Piwigo on page [Administration > My account > Manage > Referrals]. You can also read details on our blog post written in 2011.

Posted in General | Tagged | Comments Off on Referral program, reloaded

Contribute to Demo

We’re starting a new project: an “always fresh” Piwigo demo, with photos coming from Piwigo users, all over the world. Read all details on the announcement on Piwigo.org.

On the demo, you will see the contributor and a link to his Piwigo

On the demo, you will see the contributor and a link to his Piwigo

Contribute to Piwigo plugin was installed on Piwigo.com, you can activate and use it. For now we’re talking about the demo on Piwigo.org. Later we will update the Contribute to Demo plugin to let you select the demo you want to contribute to, including Piwigo.com demos.

We hope you’ll love the idea and decide to become a contributor yourself!

Posted in General | Comments Off on Contribute to Demo

Piwigo 2.8 on all accounts

We’re proud to tell you that 100% Piwigo.com accounts have been updated to Piwigo version 2.8 less than 72 hours after it was officially released. It sets a new record!

Authentication key in emails sent by Piwigo 2.8

Authentication key in emails sent by Piwigo 2.8

You can read all details about Piwigo 2.8 in the release notes.

Posted in General | Tagged | Comments Off on Piwigo 2.8 on all accounts

Piwigo 2.8 is on track and needs your beta-testing!

Happy new year 2016 to all Piwigo.com users (and other readers of this blog!) We will publish another post later in January talking about our 2016 roadmap, but for now I wanted to share the first good news: Piwigo 2.8 is coming!

Piwigo 2.8 has entered the “release candidate” period. It means we have implemented several features and we need help from the Piwigo community to test it. The more tests we have, the more stable Piwigo 2.8 will be.

What’s new in Piwigo 2.8 compared to Piwigo 2.7?

  • automatic authentication from notification emails
  • orphan photos are more obvious and easy to delete
  • ability to notify users on an album
  • upload progress shown in favicon
  • watermark can be repeated on several lines
  • improvements on user manager
During web upload, the favicon gets animated and shows you the current state. This way you can browse the web on another tab and see how your upload is going on!

During web upload, the favicon gets animated and shows you the current state. This way you can browse the web on another tab and see how your upload is going on!

All changes are described in the Piwigo.org annoucement : Piwigo 2.8.0RC1.

We need your help to test Piwigo 2.8 Release Candidates. If you want to participate, we will gladly install a test Piwigo 2.8 for you, just tell us!

Posted in General | Tagged | Comments Off on Piwigo 2.8 is on track and needs your beta-testing!

Customize your thumbnail tooltip

Update: now plugin Thumbnail Tooltip is compatible with theme Stripped
Thumbnail Tooltip configuration, here you can choose which photo properties to display

Thumbnail Tooltip configuration, here you can choose which photo properties to display

From now you can customize your tooltips. With plugin Thumbnail Tooltip, you can define which properties to use in your tooltip. You can even deactivate the tooltip if you don’t like it.

Thumbnail Tooltip in action, here we display the photo description in the tooltip

Thumbnail Tooltip in action, here we display the photo description in the tooltip

Posted in Feature | Tagged | Comments Off on Customize your thumbnail tooltip