The Day Facebook Disappeared (BGP and DNS for Devs)
· Jerwin Arnado
Archive note: this is a backdated post, written years later while rebuilding this site. It’s dated to the moment it covers, but the hindsight is real.
On October 4, Facebook didn’t go down. It went missing. For about six hours, Facebook, Instagram, WhatsApp, and Messenger ceased to exist as far as the internet was concerned — and in the Philippines, where Facebook effectively is the internet for millions (free data access included), the silence was total. Businesses that run entirely on FB pages went dark. Families’ only messaging channel vanished. GCash even saw traffic spikes as people, deprived of scrolling, apparently remembered errands.
The outage is the best networking lesson of the year, so let’s actually learn it.
What happened, in dev terms
During routine maintenance, a command audit tool failed to catch a bad command, which disconnected Facebook’s backbone — and then their BGP routes were withdrawn, which made their DNS unreachable, which made everything else unfixable.
Translation layer:
- BGP (Border Gateway Protocol) is how networks announce “the IP ranges for these services live here — route traffic to me.” It’s the internet’s address-exchange between providers. When Facebook’s routers withdrew those announcements, every router on Earth deleted its directions to Facebook. The servers were fine. There was simply no path to them.
- DNS translates
facebook.cominto IP addresses. Facebook runs its own authoritative DNS servers — which sat inside the IP ranges that had just vanished from the routing table. So even the lookup step died. To every device on the planet,facebook.comstopped resolving, full stop.
The brutal cascade: the engineers who could fix it relied on tools that ran on the same network that was down. Reports described staff unable to badge into buildings and server rooms because access control also lived behind the dead network. The fix required physically reaching routers.
The lessons, which apply at every scale
- Don’t host your recovery path inside the failure domain. Facebook’s DNS, internal tools, and door badges all depended on the thing that broke. The Laravel-scale version: your status page on the same server as the app, your deploy pipeline reachable only through the VPN that’s down, the database backup stored on the database server.
- Out-of-band access is not optional. Every serious system needs a way in that doesn’t depend on the system. For a homelab or a client server, that can be as humble as the hosting provider’s console and a printed runbook.
- Automation amplifies whatever you feed it. One bad command, dutifully propagated everywhere at machine speed. Guardrails on destructive operations — confirmation steps, canary rollouts, audit tools that actually audit — are worth their annoyance.
- Single platform = single point of failure, societal edition. The PH economy’s dependence on one company’s app stack went from think-piece topic to lived experience for six hours. For businesses: keep a second channel (even just SMS or email lists). For all of us: maybe don’t build the national commerce layer on one login.
The poetic detail: for six hours, the company that knows everything about everyone couldn’t be found by anyone. The internet kept working fine — it just genuinely, structurally, did not know where Facebook was. (Three weeks later the company renamed itself Meta, so the universe does have comedic timing.)
DNS and BGP run everything and nobody thinks about them until a day like this. Now you’ve thought about them. That’s the whole point of post-mortems — especially other people’s.