Thursday, March 16, 2017

Volatile qualifier, atomics in C

Volatile

If you're dealing with hardware, ISRs, signal handlers, threads, etc., you'll likely run into a nasty optimization scenario when throwing -O2 + to gcc.

Imagine we do something like this:

int whatever = 1;

int main(int argc, char *argv[]) {
  while(whatever) {
    asm("nop");
  }
}

Does it make sense that there is no point in evaluating whatever repeatedly? Does it make sense if you consider that whatever could be a pointer to a register? Think AVR's memory mapped registers! What about if you had a signal handler or an ISR?! Then whatever could change and the emitted instructions wouldn't bother to check.

Let's see the x64 assembly for this:

Dump of assembler code for function main:
   0x00000000004003e0 <+0>:     mov    0x200c4a(%rip),%eax        # 0x601030 <whatever>
   0x00000000004003e6 <+6>:     test   %eax,%eax
   0x00000000004003e8 <+8>:     je     0x4003f0 <main+16>
   0x00000000004003ea <+10>:    jmp    0x4003ea <main+10>
   0x00000000004003ec <+12>:    nopl   0x0(%rax)
   0x00000000004003f0 <+16>:    xor    %eax,%eax
   0x00000000004003f2 <+18>:    retq   

Oh.. so, what happens? At 0x4003ea, it just jumps to itself! But we do at least check that the value is right the first time. If this while loop was spinning waiting for a signal to pop it out of the loop, it'd just wait forever (assuming no calls in the loop).

If we simply declare whatever as volatile, we get this:

Dump of assembler code for function main:
   0x00000000004003e0 <+0>:     mov    0x200c4a(%rip),%eax        # 0x601030 <whatever>
   0x00000000004003e6 <+6>:     test   %eax,%eax
   0x00000000004003e8 <+8>:     jne    0x4003e0 <main>

   0x00000000004003ea <+10>:    repz retq 


As an additional note, the use of volatile qualifier does not imply anything with respect to write/read ordering. For effecting proper write/read ordering, the code must either use a memory fence, atomic instruction/type (C11), or platform-dependent method. Similarly, in microcontroller (and maybe some other) programming, it may make more sense to dedicate a register instead of hitting RAM via volatile.

Linus (of Linux fame) has described the use of the volatile qualifier in kernel programming. Basically, with atomic instructions and proper memory barriers in place, the use of volatile should be limited to situations where the object change is truly bound to be external (think memory-mapped I/O). This is because if you go to the trouble of memory barriers and atomic instructions, ordering is already guaranteed, you already need the barriers/atomic instructions and you'd just lose optimization in the critical section.

Monday, March 13, 2017

MMDVMHost Architecture

I've been spending time on a new implementation of DMR repeater for MMDVM-speakers. Of course, with Homebrew protocol, I could just spend time on the backend components, which leaves quite a bit of work to do. But before doing this, I need to clarify what is getting received from the radio by the host. Sadly, documentation for the MMDVM specification is out-of-date, so we're left with the MMDVMHost code as the "living specification". Rabbit holes and so on led me to digging deep into the architecture of MMDVMHost.

Conceptually, MMDVMHost is built around the modem & peer setup and the main loop in MMDVMHost.cpp. In the loop, data is polled from the modem and peer. Since I am mostly interested in DMR, I'll look at the DMR functionality.

The loop polls the modem object for data. This is done by calls to readDMRData1() and readDMRData2(). Notice the 1 and 2? This is because DMR data is already separated into timeslots by the modem object instance (from Modem.h/Modem.cpp, and we will see later, this happens by the modem itself). Data returned by these two methods are collected during the modem clock tick that occurs as part of this loop by a m_modem.clock() call.

If data is picked up via readDMRData1() or readDMRData2(), it is passed via the writeModemSlot1() or writeModemSlot2() method to the instantiated CDMRControl object. These methods are proxies for CDMRSlot instances (DMRSlot.cpp) which write to network (hence, modem write is modem SOURCED data).

For data received from the Homebrew network, the CDMRControl object is polled via the readModemSlot1() and readModemSlot2() methods. After checking whether or not there is buffer for the timeslot on the modem, the modem object's writeDMRData1() and writeDMRData2() methods may be called to queue up the data for transmission.

Similar to the modem, the underlying CDMRSlot instance buffers are filled by a call to the CDMRControl clock() method. This polls the socket to get the data.