The rabbit hole of unsafe Rust bugs – notgull – The world's number one source of notgull

snaggen@programming.dev · 11 months ago

The rabbit hole of unsafe Rust bugs – notgull – The world's number one source of notgull

Deebster@programming.dev · edit-2 11 months ago

I thought there was a type of bug called a notgull at first but that’s the author’s handle. The bugs are a use-after-free and an invalid pointer that was wrong due to an unsound calculation in non-unsafe code.

This isn’t meant as a saved you a click summary; the article’s worth the read!

OmnipotentEntity@beehaw.org · edit-2 11 months ago

Not actually a use-after-free bug, just memory corruption due to a non-aligned pointer. (Calculated from the unsound math in safe code you mentioned.)

Non-aligned memory access in CPUs is a “don’t care” condition that can do whatever is most convenient for the processor implementation. Typically, the cause for the memory corruption is due to caching. Cache lines are a certain size, in this case 128 bytes. Let’s say you want to fetch 16 bytes of data, and the processor specifies for simplicity that all data you fetch or this width should have an address divisible by 16. This is so that the data in question is entirely within a single cache line, and the memory controller doesn’t have to handle fetching two cache lines for a single load, which would be slower for no good reason.

If we violate this contract, let’s say we load a pointer with an offset that is 15 mod 16. Then the processor fetchs the cache line where the data starts, slots it into the cache memory, then reads from the cache memory and you get back either:

7/8 times: the entire correct data, because the pointer is not also at the end of the cache line
1/8 times: the first byte of the correct data and 15 bytes of other data from the cache, which data exactly is governed by the caching behavior of the L1 cache and can be difficult to predict.

This affects both loads and stores, so you can overwrite other cache lines. And doing so may mark them dirty depending on implementation. When they get evicted from the cache if they’re marked as dirty (either from this or some other write) they’ll be committed, either to the L2 cache, or if the cache is write through, also to memory. This is the most likely source of the memory corruption, to my understanding.

Deebster@programming.dev · 11 months ago

The article mentions two bugs and the first is a use-after-free. The article spends most of the time discussing the second issue (the one you nicely explained).

OmnipotentEntity@beehaw.org · 11 months ago

oh dang! I am a dummy and completely missed the first one somehow.

KillTheMule@programming.dev · edit-2 11 months ago

This parting shot sounds pretty dire

a bug in safe code can easily cause unsound behavior in your unsafe code if you’re not careful.

That’s definitely not how it should be. Fortunately, I think I disagree with that, since miri points to the “real” buggy code:

unsafe { inner.as_ref() }

As opposed to the article, I’d argue this code is not correct, since it did not account for alignment, which it must (I mean, by standard use of the word unsound this is unsound, since it can be called from safe code introducing UB). Or am I wrong? Is the fundamental value proposition of rust moot?

KiranWells@pawb.social · 11 months ago

I believe you are correct; if the unsafe code can cause undefined behavior if input data is not following a specific contract, then the entire function should be labeled unsafe so the caller knows that.

The other option is to check to make sure the contract is valid, and return an error or panic if it is not. That function would be sound, as no inputs cause undefined behavior.