Any invocation of undefined behavior should be considered an assertion failure as far as any model checker should be concerned. Compilers can--and will--treat undefined behavior as license to alter the semantics of your program without any constraint, and most instances of undefined behavior are clearly programmer error (there is no good reason to read uninitialized memory, for example).
Reading an uninitialized variable is not "subtle" undefined behavior. It's one of the most readily accessible examples not only of what undefined behavior can exist, but also the ways compilers will mutilate your code just because you did it. To be honest, if something as simple as the consequences of reading uninitialized memory are shaky understanding for someone trying to prove code correct, that will completely undermine any trust I have in the validity of your proofs.
Also, even if the compiler does what a "Real programmer" type thinks it "should" do for this case (and I agree with you that you're not entitled to expect that), you aren't guaranteed that there's some particular value since you never initialized it.
Your operating system likely feels entitled to assume that if you never wrote to this page of RAM you don't care what exactly is in it. After all what kind of lunatic reads a bunch of unknown data, says "Yeah, that's coincidentally what I wanted" and just leaves it unmodified? No, almost anybody would write data they want to keep instead. So, if you never wrote to this particular page of RAM and your OS finds it convenient to swap that page for a different one, no harm no foul right? But now the contents of your uninitialized variable changed!
> So, if you never wrote to this particular page of RAM and your OS finds it convenient to swap that page for a different one, no harm no foul right? But now the contents of your uninitialized variable changed!
No sane OS will do this. Any page that's handed to a process that was last written by a different process must be zero'd (or otherwise have every address initialized) by the OS to avoid leaking information across process boundaries. You could, in theory, have a page that was munmap'd by /this/ process be handed back to the same process to fill a request for a different virtual address without zeroing it, but I can't imagine that any OS tracks the last writer to enable this "optimization" in the few cases it would apply.
glibc marks freed pages with MADV_FREE, and on Linux, an MADV_FREE page won't be unmapped immediately. It's possible to read a value of an MADV_FREE'd page, have Linux swap the page out, and then read it again and now observe a zero-init'd page. If malloc returns a new allocation to a freed page, it isn't written to by the allocator (which would prevent the page from being subsequently zeroed if it hasn't already done so), so it doesn't guarantee that the contents won't be randomly zeroed.
Whether or not this is a sane OS I leave an exercise to the reader, but it is nonetheless the property of a common OS.
> glibc marks freed pages with MADV_FREE, and on Linux, an MADV_FREE page won't be unmapped immediately. It's possible to read a value of an MADV_FREE'd page, have Linux swap the page out, and then read it again and now observe a zero-init'd page. If malloc returns a new allocation to a freed page, it isn't written to by the allocator (which would prevent the page from being subsequently zeroed if it hasn't already done so), so it doesn't guarantee that the contents won't be randomly zeroed.
Oh.
I feel like I should have more to say than that, but... oh.
(there's also usually a write, unless the allocation-head-metadata is on a different page than the parts of the allocation you're acting spooky with)
Edit: after watching the video elsewhere in thread, it was indeed crossing page boundary, but that behavior has to be a kernel bug since the documentation says "accesses" are sufficient to repopulate it.
No, it's specifically a write that causes the page to actually be freed. Per the man page:
> However, subsequent writes to pages in the range will succeed and then kernel cannot free those dirtied pages, so that the caller can always see just written data. If there is no subsequent write, the kernel can free the pages at any time. Once pages in the range have been freed, the caller will see zero-fill-on-demand pages upon subsequent page references.
At least Linux does this, and it is not a security hole because it only reuses pages with memory of the current process (as in the worst case you end up reading data from some other part of your program, but not other programs).
The problem is that uninitialized memory really is subtle.
After initializing all members of a struct, the padding bytes of that struct are still uninitialized. Yet, it is valid to memcpy the whole struct somewhere else; even with a hand-written char*-based copying loop. (and indeed, this is a good reason for reading uninitialized memory!)
So reading uninitialized memory is not always undefined behavior; just most of the time.
The C standard text is barely sufficient for the purposes of compilers, but wholly insufficient for what a model checker would need to know.
> After initializing all members of a struct, the padding bytes of that struct are still uninitialized.
Depends on how you initialize them. If you do it all in one go, then yes. If you partially initialize it, then no: some of the padding bytes are guaranteed to be zero. (Observing this and maintaining this variant is an exercise for the reader.)
> there is no good reason to read uninitialized memory, for example
That's an assumption on your part.
One very good reason might be to ensure that memory really is there given the optimistic allocation schemes and to force a page to be read into core. You could also write to it to get the same effect. Another good reason to do a read on uninitialized memory is to see what's still there from the previous program that ran. Now arguably that's a nefarious reason but it is still a good one and it may help uncover bugs in the underlying system.
The code doesn't read uninitialized memory. Automatic storage of objects need not be in memory. The C99 standard only uses the word "memory" in normative sections when talking about "malloc"
I apologize for using the word subtle. What I should have said is I don't understand the topic and reading about it has left me a sense that one should be very careful.
Any invocation of undefined behavior should be considered an assertion failure as far as any model checker should be concerned. Compilers can--and will--treat undefined behavior as license to alter the semantics of your program without any constraint, and most instances of undefined behavior are clearly programmer error (there is no good reason to read uninitialized memory, for example).
Reading an uninitialized variable is not "subtle" undefined behavior. It's one of the most readily accessible examples not only of what undefined behavior can exist, but also the ways compilers will mutilate your code just because you did it. To be honest, if something as simple as the consequences of reading uninitialized memory are shaky understanding for someone trying to prove code correct, that will completely undermine any trust I have in the validity of your proofs.