Cloudfare outage post mortem

homura1650@lemmy.world · 2 months ago

Cloudfare outage post mortem

IphtashuFitz@lemmy.world · 2 months ago

You would do well to go read up on the 1990 AT&T long distance network collapse. A single line of changed code, rolled out months earlier, ultimately triggered what you might call these days a DDoS attack that took down all 114 long distance telephone switches in their global network. Over 50 million long distance calls were blocked in the 9 hours it took them to identify the cause and roll out a fix.

AT&T prided itself on the thoroughness of their testing & rollout strategy for any code changes. The bug that took them down was both timing-dependent and load-dependent, making it extremely difficult to test for, and required fairly specific real world conditions to trigger. That’s how it went unnoticed for months before it triggered.

Cloudfare outage post mortem

Cloudfare outage post mortem

Cloudflare outage on November 18, 2025