Early in my career I broke Facebook Chat (pre Messenger) for all IE users. Couldn't sleep all night afterwards.
However, one of the most interesting issues I caused was when I removed about 300,000 duplicated files in Facebook's monorepo and pointed everything to the remaining copies. I went to sleep and woke up the next day with angry messages on my phone: I made
facebook.com 1.2 seconds slower to load!
How? Facebook's bundler used machine learning to optimize JS bundles. That bundler was relying on paths for JS files, and it didn't track changes like mine. So when I pushed my change, the bundler was under the impression that all those files were new and basically unbundled them and sent them to users as individual files instead of JS bundles!
Given the scale Facebook was operating at, and some issues in the Python based Mercurial client (which have since been replaced), it took about an hour to merge my PR into main. The revert to make
facebook.com fast took a three people team 12 hours to complete. Afterwards an intern and I redid the initial work by also patching the bundler's paths correctly.
I caused other large issues, but this one is my best bad one. It worked while testing, could not have been noticed in development since the bundling infrastructure was different, and basically only one person at the company would have known this issue could have been caused.
I would love to hear stories from senior+ engineers who pushed fuck ups to prod to know I’m not crazy for feeling some type of way to be told my code quality is poor due to code pushed to prod that I quickly fixed when I found the issues . Just wanna know if this is something common . Also PR requires 2 approvals.