Volatile
If you're dealing with hardware, ISRs, signal handlers, threads, etc., you'll likely run into a nasty optimization scenario when throwing -O2 + to gcc.Imagine we do something like this:
int whatever = 1;
int main(int argc, char *argv[]) {
while(whatever) {
asm("nop");
}
}
Does it make sense that there is no point in evaluating whatever repeatedly? Does it make sense if you consider that whatever could be a pointer to a register? Think AVR's memory mapped registers! What about if you had a signal handler or an ISR?! Then whatever could change and the emitted instructions wouldn't bother to check.
Let's see the x64 assembly for this:
Dump of assembler code for function main:
0x00000000004003e0 <+0>: mov 0x200c4a(%rip),%eax # 0x601030 <whatever>
0x00000000004003e6 <+6>: test %eax,%eax
0x00000000004003e8 <+8>: je 0x4003f0 <main+16>
0x00000000004003ea <+10>: jmp 0x4003ea <main+10>
0x00000000004003ec <+12>: nopl 0x0(%rax)
0x00000000004003f0 <+16>: xor %eax,%eax
0x00000000004003f2 <+18>: retq
Oh.. so, what happens? At 0x4003ea, it just jumps to itself! But we do at least check that the value is right the first time. If this while loop was spinning waiting for a signal to pop it out of the loop, it'd just wait forever (assuming no calls in the loop).
If we simply declare whatever as volatile, we get this:
Dump of assembler code for function main:
0x00000000004003e0 <+0>: mov 0x200c4a(%rip),%eax # 0x601030 <whatever>
0x00000000004003e6 <+6>: test %eax,%eax
0x00000000004003e8 <+8>: jne 0x4003e0 <main>
0x00000000004003ea <+10>: repz retq
As an additional note, the use of volatile qualifier does not imply anything with respect to write/read ordering. For effecting proper write/read ordering, the code must either use a memory fence, atomic instruction/type (C11), or platform-dependent method. Similarly, in microcontroller (and maybe some other) programming, it may make more sense to dedicate a register instead of hitting RAM via volatile.
Linus (of Linux fame) has described the use of the volatile qualifier in kernel programming. Basically, with atomic instructions and proper memory barriers in place, the use of volatile should be limited to situations where the object change is truly bound to be external (think memory-mapped I/O). This is because if you go to the trouble of memory barriers and atomic instructions, ordering is already guaranteed, you already need the barriers/atomic instructions and you'd just lose optimization in the critical section.
No comments:
Post a Comment