Science

Facebook is recovering from a massive outage that affected all of its services. What was it due to?

You may not have missed it: Last night, Facebook, along with Whatsapp, Instagram, Messenger, and OculusVR, suffered a major global outage that made all of these services inaccessible for several hours. This is the group’s worst outage since March 2019, when a similar incident rendered the platform inaccessible for nearly 24 hours. If some initially believed in a large-scale hack – probably in reaction to recent revelations by Frances Haugen, a former Facebook employee – the social network evokes this morning of purely technical problems.

The blackout began shortly before noon on the East Coast of the United States (around 6 p.m. KST) and lasted for almost six hours. No message could be sent on WhatsApp, while Instagram was showing an error message and Facebook was announcing a malfunction. The Verge reports that Facebook engineers were immediately dispatched to the company’s data centers in the US to try to resolve the issue, proving that the outage was far from trivial.

A test of the ISP’s DNS servers through DNSchecker.org showed that most of them had managed to find a route to Facebook.com by 5:30 pm ET and all services were accessible within minutes; some of them, however, were only fully functional after a few extra hours. Facebook said in a statement that the failure of its networks and messaging was caused by a “faulty configuration change” of its servers; officials added that user data had not been compromised.

One of the most massive failures ever detected

Obviously, given the amount of services that are now based on this platform, the slightest failure causes a snowball effect that does not go unnoticed. Last night, which affected more than 14 million users, is one of the most massive outages ever detected by the DownDetector site. According to eMarketer, the digital advertising carried by Facebook represents more than 48 billion dollars a year, so there was an urgent need to remedy the situation. “Our engineering teams have learned that configuration changes to the backbone routers that coordinate network traffic between our data centers have caused problems that disrupted this communication,” the group explains, without further details.

© DownDetector

This failure is all the more embarrassing as it occurs when the group has to face the accusations of a former employee, a whistleblower who accuses the company of putting its profits before the safety of users; in particular, it revealed confidential documents that reveal certain unscrupulous actions. Many immediately thought of a cyber attack aimed at destabilizing the group. But Facebook sticks to a technical explanation: “We want to make it clear at this point that we believe the main cause of this failure was a faulty configuration change. We also have no evidence that user data has been compromised as a result of this downtime. “

Mark Zuckerberg immediately posted an apology on his Twitter account: “Sorry about the interruption today. I know how much you trust our services to stay in touch with the people you care about.” If the situation has hurt many users whose activity (and income) totally depend on these services, in any case it has benefited some rival platforms, such as Signal, which on Twitter welcomed the increase in registrations during these few hours of blackout. . And during this same period of time, SMS has returned …

Roads “crossed out” from the map after an update

The blackout also affected all communication systems and collaborative tools used by Facebook’s own employees. They were no longer able to receive external emails, and even their credentials to access company premises were no longer working!

Experts were quick to point fingers at Domain Name System (DNS) and Border Gateway Protocol (BGP) servers to explain the incident. BGP is an Internet routing protocol, which means that it provides instructions to move traffic from one IP address to another in the most efficient way possible; An IP address is the actual address of a particular website. Without BGP, Internet routers wouldn’t know what to do and the Internet wouldn’t work. In short, DNS servers provide the IP address and BGP provides the most efficient way to reach that address.

Yesterday, Facebook and its affiliated services were all down, as if they had suddenly been erased from the Internet, explain specialists from the company Cloudflare: “Their DNS names stopped being resolved and their infrastructure IP addresses were inaccessible. It was as if someone had “pulled the cables” out of their data centers at once and disconnected them from the Internet. “Bryan Krebs, a cybersecurity journalist, reports that a Facebook source explained the incident with a” routine BGP update. This update would have erased the DNS routing information that Facebook needs for other networks to reach its various services. In fact, Cloudflare’s CTO confirms that it detected several strange changes to BGP just before the incident occurred.

Fun fact: Facebook was briefly wiped off the map, a third party attempted to list their domain name as for sale with various industry players, such as DomainTools and GoDaddy, who mistakenly included this ad in their listings! Today everything is back to normal. But after the incident, the price of Facebook, which already fell yesterday at the beginning of the session, plunged 5% on Wall Street, causing Mark Zuckerberg to lose almost 6,000 million dollars.

Facebook Engineering

Back to top button