7 things that saved my life during the development of the Shruti-1

Pivotal Tracker

I first heard about Pivotal Tracker at Google. I would enter my office, look at the screens of my co-workers and I would get an approximately balanced mix of Gmail, MapReduce status pages, vi, emacs and Pivotal Tracker. I did not really see the point until I started using it. And it’s wonderful. Everytime I had some free time at home, I would open it and look for the next task to do, the feature to implement, the components to order, the documents to review. My girlfriend spending more time in the bathroom than expected? Let’s see if there’s a tiny task I can squash during the time. This project is worth 497 Pivotal Tracker points.

Not using the Wiring library

Wiring (the hardware interface library that comes with the Arduino development tools) gave me a bad impression… Very early in the Shruti-1 project I had to consider code size/speed matters and I was not happy with what I saw from Wiring. In fact, it seems to me that the concept of function (runtime abstraction), though very easy to understand for beginners, is not the best tool to build the kind of abstractions we want in a hardware interface library. Why? Because functions lead to inefficient things like having to call into a function and zipping through different branches everytime we want to write to a GPIO; or compiling division code that will be called only once or twice to set a baud rate. My other grip with Wiring and more generally the libraries that are part of the Arduino toolchain was that many libraries (including Serial) relied on polling I/O status registers and/or made use of delay loops. I wanted the Shruti-1 to be wait/busy loop free – the processor ought to be all the time doing something useful – be it audio synthesis or responding to user actions. And in the end, reimplementing by myself all of the hardware-level stuff I needed forced me to read the holy ATMega328p datasheet, something I would have probably never done if I had kept using Wiring and existing libraries.

C++

This would deserve an entire post by itself… First of all, I’m not a C++ freak – I actually had decided to avoid the language in 2003 and managed to do so for 4 years, before joining Google where I wrote a lot of their delicious deceptively “dumbed down” (dixit the C++ gurus I’m now working this) flavor of it. C++ for embedded systems looks pretty silly. Here’s why it is not:

  • As long as you know what you’re doing, C++ is as efficient as C. Most of the Shruti-1 code consists of “static singletons” – classes in which data members and methods are static. This has absolutely no overhead compared to the classical “globals and functions” used in C – but you get, for free, all the benefits of C++ I will detail below.
  • Data access discipline. Access modifiers and const references are perfect tools for keeping the interaction between pieces of code to the strict minimum, and for making sure that no silly shortcuts like “let’s modify this global directly in the middle of nowhere” are taken by sheer laziness. One pattern in the Shruti-1 code is to expose const Foo& accessor(); and Foo* mutable_accessor(); Not only this makes sure that I won’t modify by accident some data that was supposed to be read-only, but this works as a fairly useful annotation of the code, which allows me, with a simple grep, to find out who is interacting with each piece of data.
  • Compile time abstractions. Let’s say you want to write some reusable code that talks to a shift register. You can write a function that will take integers or enum arguments identifying the data/clock/latch pins used in your project, but then, you will need another function that maps each pin number to a memory mapped register address + bitmask to do the actual write (that’s the Wiring way of doing things). Or you could pass the address of those registers and the bitmasks as arguments (a bit ugly). Or you can copy/paste some existing code and update the names of the registers and bitmasks. Or, if you are adventurous, you can write a C macro. C++ provides a nice solution for that: templates – the compiler does the copy and paste for you and compiles code specialized for exactly the registers/bitmasks you are using in your project, down to the right SBIs and CBIs — instead of a library function that can cover all the possible cases but will be called only for one of them.
  • Compile time arguments. Let’s say you want to write an abstraction that configures the UART to a particular baud rate. This involves computing a ratio of the CPU frequency to the baud rate. If you put that into a classic C function, the compiler will actually generate code for the division, even if your function is called in one place. Stupid. You want the division to be executed at compile-time and “flattened” in the code. You could do that with C macros, but C++ has a way of doing that is better integrated with the language:
template<uint16_t baud_rate>
static inline void SetBaudRate() {
	uint16_t prescaler = (F_CPU / 16 + baud_rate / 2) / baud_rate - 1;
	UBRR0H = prescaler >> 8;
	UBRR0L = prescaler;
}
  • Compile time coupling. Let’s say you have written a very generic class which does MIDI message parsing. Now you want to interface it with your synthesis engine. Should the abstract MIDI class have pointer callbacks to call into the synthesis engine? Maybe. Should the synthesis engine be a subclass of the MIDI parser and should the MIDI parsing code call virtual methods that the synthesis engine would implement? Why not. But there’s an even more radical design: template<typename DeviceRespondingToMidiMessage> class MidiParser What does it do? Nothing much different from having the compiler compiling your MIDI message handlers and putting them inline right into the MIDI parsing code! And since the compiler will know at compile time which messages the handler parses and which ones it ignores, it’ll automatically remove from the the MIDI parsers all the parsing code for the messages which are not handled. You get abstraction and code reuse at no cost.
  • Class unrolling. This is a terrible trick… I have observed quite consistently that in most situations code which uses direct addressing (for example, reading a static global variable) is more compact and faster that code which uses indirect addressing (reading a member of a struct passed as an argument to a function, or its C++ equivalent, using a class member variable within a method). This got me thinking… what if, instead of having an Oscillator class and 3 instances of it for each oscillator, I had 3 sets of static methods and static variables for each “instance”. This would spare the cost of indirect addressing since each copy would have all the addresses of the data they manipulate pre-compiled in them. This is very bad news for code size since there’s a lot of triplication. But this turned out to have a tremendous impact on speed, something like a 25% speed boost, and enough to reach my goal of getting two interpolated, cross-faded oscillators running at 31kHz. Now how do you easily create specialized instances of a block of code? That C++ thing, template<int instance_index> class Foo.

The resource builder and make

Some tasks are so boring… Adding a string to a resource table and declaring a STR_MY_STRING constant to represent it in your string loader. Or running a tool to compute waveforms and convert them to some bunch of declarations you’ll copy/paste into a source file… I created a tool for that (the resources compiler ), invokable by make resources and that’s it. In fact, pretty much all activities related to the project (knowing the size of the firmware, uploading the firmware to the board, getting a symbol by symbol report of the code size) have make actions.

PWM

I would have never, never, never thought that something that good could come out of the PWM output. I bought SPI DACs (those 12 bit ones from Microchip) and convinced myself that they were the way to go but never observed a difference in sound quality that would have justified the CPU cost. The overhead of SPI communication (or, for the lazy ones, writing to a 8-bit shift register connected to a R2R) was enough to create glitches at 31kHz… so the Shruti-1 would have been running at 20KHz (or would have had no sub-oscillator and noise generator in the mixer section).

The Blit paper

In the end, I did not implement the approach in this paper (I used bandlimited wavetables instead), except for one thing – the idea of building a PWM waveform by integrating up/down impulses. The PWM on the Shruti-1 is not perfect because the leaky integrator is too leaky for the lower notes and not leaky enough for the highest notes (and for those notes the DC modulation starts getting in the audio range), but it’s way better than the naive approach.

The Arduino board

Sometimes things are not working as they should and there are too many suspects. Is it my hardware design? Is it my code? When the investigation is hard, it’s good to get back to a very safe and proven playground environment. I often ran naive wiring sketch on the Shruti-1 board to make sure that the hardware was running correctly and that my firmware code was faulty (only to find out that there was a genuine problem with the hardware). I often ran my firmware on an Arduino board with the minimal hardware connected to it to reproduce the problem, to make sure that my code was correct and this was a problem with the hardware (only to find out that the code was faulty). Having a hardened hardware/software development environment, no matter how imperfect it is, is extremely useful for testing.