10% of Firefox crashes are caused by bitflips

JensSpahnpasta@feddit.org · 3 months ago

10% of Firefox crashes are caused by bitflips

tal@lemmy.today · edit-2 3 months ago

The problem is that ECC is one of the things used to permit price discrimination between server (less price sensitive) and PC (more price sensitive) users. Like, there’s a significant price difference, more than cost-of-manufacture would warrant. There are only a few companies that make motherboard chipsets, like Intel, and they have enough price control over the industry that they can do that. You’re going to be paying a fair bit more to get into the “server” ecosystem, as a result of that.

Also…I’m not sure that ECC is the right fix. I kind of wonder whether the fact is actually that the memory is broken, or that people are manually overclocking and running memory that would be stable at a lower rate at too high of a rate, which will cause that. Or whether BIOSes, which can automatically detect a viable rate by testing memory, are simply being too aggressive in choosing high memory bandwidth rates.

EDIT: If it is actually broken memory and only a region of memory is affected, both Linux and Windows have the ability to map around detected bad regions in memory, if you have the bootloader tell the kernel about them and enough of your memory is working to actually get your kernel up and running during initial boot. So it is viable to run systems that actually do have broken memory, if one can localize the problem.

https://www.gnu.org/software/grub/manual/grub/html_node/badram.html

Something like MemTest86 is a more-effective way to do this, because it can touch all the memory. However, you can even do runtime detection of this with Linux up and running using something like memtester, so hypothetically someone could write a software package to detect this, update GRUB to be aware of the bad memory location, and after a reboot, just work correctly (well, with a small amount less memory available to the system…)

grue@lemmy.world · edit-2 3 months ago

Also…I’m not sure that ECC is the right fix. I kind of wonder whether the fact is actually that the memory is broken, or that people are manually overclocking and running memory that would be stable at a lower rate at too high of a rate, which will cause that.

Some of it is cosmic rays, right? I think ECC is still worth it even at JEDEC speeds.

tal@lemmy.today · 3 months ago

even at JEDEC speeds.

My last Intel motherboard couldn’t handle all four slots filled with 32GB of memory at rated speeds. Any two sticks yes, four no. From reading online, apparently that was a common problem. Motherboard manufacturers (who must have known that there were issues, from their own testing) did not go out of their way to make this clear.

Maybe it’s not an issue with registered/buffered memory, but with plain old unregistered DDR5, I think that manufacturers have really selling product above what they can realistically do.

AA5B@lemmy.world · 3 months ago

I wonder if ai can actually help here. As the industry abandons consumer hardware in favor of datacenter equipment to profit from the ai bubble, perhaps ecc memory will become cheaper

10% of Firefox crashes are caused by bitflips

10% of Firefox crashes are caused by bitflips

Gabriele Svelto (@[email protected])