- cross-posted to:
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
The issue was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind. Instead, it was triggered by a change to one of our database systems’ permissions which caused the database to output multiple entries into a “feature file” used by our Bot Management system. That feature file, in turn, doubled in size. The larger-than-expected feature file was then propagated to all the machines that make up our network.
The software running on these machines to route traffic across our network reads this feature file to keep our Bot Management system up to date with ever changing threats. The software had a limit on the size of the feature file that was below its doubled size. That caused the software to fail.


How about an hour? 10 minutes? Would have prevented this. I very much doubt that their service is so unstable and flimsy that they need to respond to stuff on such short notice. It would be worthless to their customers if that were true.
Restarting and running some automated tests on a server should not take more than 5 minutes.
5 minutes of uninterrupted DDoS traffic from a bot farm would be pretty bad.
5 hours of unintended downtime from an update is even worse.
Edited for those who didn’t get the original point.
It wasn’t an unintentional update though, it was an intentional update with a bug.
Edited. My point still stands.
Significantly better than several hours od most of the internet being down.
Maybe not updating bot mitigation fast enough would cause an even bigger outage. We don’t know from the outside.