Speaking as a compiler writer: Any invocation of undefined behavior should be co...

tialaramex · on Jan 30, 2024

Also, even if the compiler does what a "Real programmer" type thinks it "should" do for this case (and I agree with you that you're not entitled to expect that), you aren't guaranteed that there's some particular value since you never initialized it.

Your operating system likely feels entitled to assume that if you never wrote to this page of RAM you don't care what exactly is in it. After all what kind of lunatic reads a bunch of unknown data, says "Yeah, that's coincidentally what I wanted" and just leaves it unmodified? No, almost anybody would write data they want to keep instead. So, if you never wrote to this particular page of RAM and your OS finds it convenient to swap that page for a different one, no harm no foul right? But now the contents of your uninitialized variable changed!

addaon · on Jan 30, 2024

> So, if you never wrote to this particular page of RAM and your OS finds it convenient to swap that page for a different one, no harm no foul right? But now the contents of your uninitialized variable changed!

No sane OS will do this. Any page that's handed to a process that was last written by a different process must be zero'd (or otherwise have every address initialized) by the OS to avoid leaking information across process boundaries. You could, in theory, have a page that was munmap'd by /this/ process be handed back to the same process to fill a request for a different virtual address without zeroing it, but I can't imagine that any OS tracks the last writer to enable this "optimization" in the few cases it would apply.

jcranmer · on Jan 30, 2024

glibc marks freed pages with MADV_FREE, and on Linux, an MADV_FREE page won't be unmapped immediately. It's possible to read a value of an MADV_FREE'd page, have Linux swap the page out, and then read it again and now observe a zero-init'd page. If malloc returns a new allocation to a freed page, it isn't written to by the allocator (which would prevent the page from being subsequently zeroed if it hasn't already done so), so it doesn't guarantee that the contents won't be randomly zeroed.

Whether or not this is a sane OS I leave an exercise to the reader, but it is nonetheless the property of a common OS.

addaon · on Jan 30, 2024

> glibc marks freed pages with MADV_FREE, and on Linux, an MADV_FREE page won't be unmapped immediately. It's possible to read a value of an MADV_FREE'd page, have Linux swap the page out, and then read it again and now observe a zero-init'd page. If malloc returns a new allocation to a freed page, it isn't written to by the allocator (which would prevent the page from being subsequently zeroed if it hasn't already done so), so it doesn't guarantee that the contents won't be randomly zeroed.

Oh.

I feel like I should have more to say than that, but... oh.

o11c · on Jan 31, 2024

Wouldn't that be canceled by the read fault?

(there's also usually a write, unless the allocation-head-metadata is on a different page than the parts of the allocation you're acting spooky with)

Edit: after watching the video elsewhere in thread, it was indeed crossing page boundary, but that behavior has to be a kernel bug since the documentation says "accesses" are sufficient to repopulate it.

jcranmer · on Jan 31, 2024

No, it's specifically a write that causes the page to actually be freed. Per the man page:

> However, subsequent writes to pages in the range will succeed and then kernel cannot free those dirtied pages, so that the caller can always see just written data. If there is no subsequent write, the kernel can free the pages at any time. Once pages in the range have been freed, the caller will see zero-fill-on-demand pages upon subsequent page references.

o11c · on Jan 31, 2024

Ah, I mixed up MADV_DONTNEED (which is older) with MADV_FREE.

amluto · on Jan 30, 2024

Neither Linux nor Windows have this sort of uninitialized memory, and it would be quite the security hole if they did.

In general, the things your compiler thinks are UB are not the same things your OS or CPU thinks are undefined.

SkiFire13 · on Jan 30, 2024

At least Linux does this, and it is not a security hole because it only reuses pages with memory of the current process (as in the worst case you end up reading data from some other part of your program, but not other programs).

And yes, it does happen in practice, most famously it has been mentioned in the "The strange details of std::string at Facebook" talk at CppCon 2016 https://youtu.be/kPR8h4-qZdk?t=1150&si=2R358wniZfxTJLmc

o11c · on Jan 31, 2024

After watching, that has to be a kernel bug, since the documentation says "accesses" will cause repopulation, not "writes".

Edit: nevermind, I mixed up MADV_DONTNEED with MADV_FREE

ynik · on Jan 30, 2024

The problem is that uninitialized memory really is subtle. After initializing all members of a struct, the padding bytes of that struct are still uninitialized. Yet, it is valid to memcpy the whole struct somewhere else; even with a hand-written char*-based copying loop. (and indeed, this is a good reason for reading uninitialized memory!) So reading uninitialized memory is not always undefined behavior; just most of the time.

The C standard text is barely sufficient for the purposes of compilers, but wholly insufficient for what a model checker would need to know.

saagarjha · on Jan 31, 2024

> After initializing all members of a struct, the padding bytes of that struct are still uninitialized.

Depends on how you initialize them. If you do it all in one go, then yes. If you partially initialize it, then no: some of the padding bytes are guaranteed to be zero. (Observing this and maintaining this variant is an exercise for the reader.)

o11c · on Jan 31, 2024

For reference, GCC has `__builtin_clear_padding` which is very useful to avoid problems there.

jacquesm · on Jan 31, 2024

> there is no good reason to read uninitialized memory, for example

That's an assumption on your part.

One very good reason might be to ensure that memory really is there given the optimistic allocation schemes and to force a page to be read into core. You could also write to it to get the same effect. Another good reason to do a read on uninitialized memory is to see what's still there from the previous program that ran. Now arguably that's a nefarious reason but it is still a good one and it may help uncover bugs in the underlying system.

aidenn0 · on Jan 31, 2024

The code doesn't read uninitialized memory. Automatic storage of objects need not be in memory. The C99 standard only uses the word "memory" in normative sections when talking about "malloc"

philzook · on Jan 30, 2024

Yes, I do not trust my understanding of C. Which is why I want mechanized assistance.

philzook · on Jan 30, 2024

I apologize for using the word subtle. What I should have said is I don't understand the topic and reading about it has left me a sense that one should be very careful.

jacquesm · on Jan 31, 2024

You're spot on. The CVE lists are filled with people who thought they understood this stuff better than they really did.