A few years ago I designed a way to detect bit-flips in Firefox crash reports and last year we deployed an actual memory tester that runs on user machines after the browser crashes. Today I was looking at the data that comes out of these tests and now I'm 100% positive that the heuristic is sound and a lot of the crashes we see are from users with bad memory or similarly flaky hardware. Here's a few numbers to give you an idea of how large the problem is. 🧵 1/5
Nobody fucking cares my man. Not important. Nobody in the regular world has ever been effected by not having ECC. You’re inventing edge cases that most cares about. Linus suffers from not understanding normal people.
You can’s speak about not having frequent corruption of files when you are not using tools detecting it. I can guarantee you have plenty of already corrupt stuff on your hard drives. RAM bit flips do contribute to that.
You have bugs (leading to broken documents, something failing, freezes, crashes) in applications you use and part of them is not due to developer’s error, but due to uncorrected memory errors.
If you’d try using a filesystem like ZFS with checksumming and regular rescans, you’d see detected errors very often. Probably not corrected, because you’d not use mirroring to save space, dummy.
And if you were using ECC, you’d see messages about corrected memory errors in dmesg often enough.
Nobody fucking cares my man. Not important. Nobody in the regular world has ever been effected by not having ECC. You’re inventing edge cases that most cares about. Linus suffers from not understanding normal people.
You can’s speak about not having frequent corruption of files when you are not using tools detecting it. I can guarantee you have plenty of already corrupt stuff on your hard drives. RAM bit flips do contribute to that.
You have bugs (leading to broken documents, something failing, freezes, crashes) in applications you use and part of them is not due to developer’s error, but due to uncorrected memory errors.
If you’d try using a filesystem like ZFS with checksumming and regular rescans, you’d see detected errors very often. Probably not corrected, because you’d not use mirroring to save space, dummy.
And if you were using ECC, you’d see messages about corrected memory errors in dmesg often enough.
Based on the article, it looks like at least 10% of crashes are caused by not having ECC.
Well, you are demonstrating that you’re an expert people person so I’ll just have to take your word.