[Home | Contact | What's New? | Products | Services | Tips | Mike |
Living with Schizoaffective Disorder

Please to Forgive

This site totally sucks when viewed on a smartphone.
I'll fix this Real Soon Now.

Quality Time With Ogg Frog

A day in the life of a Free Software programmer.

Michael David Crawford, Consulting Software Engineer
mdcrawford@gmail.com

April 3, 2007

Copyright © 2007 Michael David Crawford. All Rights Reserved.

So I said in Ogg Frog Or Bust that Ogg Frog needed some real quality time spent rearchitecting it if I was ever to finish it. It suffered from deadlocks - either the music would stop playing, or the user interface would become unresponsive or both. I knew what I needed to do to fixed it, but it meant gutting large portions of my code and rewriting it.

Rippit the Ogg Frog

I'm happy to report that (knock on wood) all the deadlocks are fixed. There is more work to be done to finish up, but I'm able to listen to music with Ogg Frog for hours with nary an interruption.

My first stab at fixing the deadlocks fixed it but good - the deadlocks went from intermittent to occurring with every track played.

Below I will tell you why multithreaded programming is hard, and what I did to fix the deadlocks.


No Winners

Most people who read my diaries aren't programmers (Hi Mom!) so I'll explain deadlocks in a non-technical way:

Suppose I have an apple and you have an orange, and I say to you "Give me that orange" and you reply "Give me that apple first". And you say, "Give me that apple," and I say "Not until you give me that orange". You can see that we're stuck, and we won't be able to proceed until someone accepts a compromise.

A software thread is an independent sequence of execution in a program. There are many advantages to multithreaded software, such as distributing the work load over more than one Central Processing Unit in a multiprocessor computer such as my MacBook Pro's Intel Core Duo

But even on single-processor computers, multithreaded code can be more responsive and efficient; if one thread is blocked while waiting, say, for some data from a file to be read in, the other threads can continue running. If the code is architected right, then in general multithreaded code makes better use of the available processing power.

What makes multithreaded code difficult as that the threads often need to share data with each other; for example, Ogg Frog's main window decides whether to enable the Pause button based on whether a track is playing; the window is drawn and the music is played by different threads.

If two threads access some data simultaneously, and one them alters the data, the other thread might find it in an inconsistent state. This can lead to erronous behaviour, data loss or even crashes. One way of making accesses to data atomic is to use various kinds of locks to control access to the data.

And here is where my fruit analogy replies: if my thread holds a lock, and tries to acquire a lock that your thread is holding at the same time as your thread tries to acquire my lock, both threads will stop. There are other ways that deadlocks can happen, but two threads contending for two locks is the most common cause of the problem.

Careful design can prevent lock contention, but because software is often quite complex, it's harder than you would think. Using locks is conceptually easier than other designs, but in anything really complicated it can be really hard to keep things straight.

My first stab at resolving the deadlocks in Ogg Frog was to create a class called ZAudioPlayer that would manage the whole audio pathway. (I'm going to contribute Ogg Frog's core audio code to ZooLib, which has the convention that all its classes are prefixed with a "Z".)

When my window wanted to know whether to enable the Pause button, it would lock the ZAudioPlayer object then ask it if a track was playing. It has lots of such functions, each of which would require I lock the object. It seemed a natural solution, but it transformed my bug from one that was rare and hard to reproduce to a bug that happened every single time I tried to play a track.

I discussed this with Andy Green, the chief architect of ZooLib. Andy understands multithreaded code better than anybody: ZooLib's flagship product is an educational multimedia database called Knowledge Forum that can handle thousands of simultaneous users.

He has spent years wrestling with deadlocks. And he told me that, for anything really complex, there is no other solution than to get rid of the locks and to have one's threads communicate via messages instead; ZooLib's facility for this is the ZMessage class.

I knew that Andy was right, and that that's what I needed to do was to use ZMessages, but it required I turn much of the program inside-out from what it was originally.

The nice thing about using locks (when they do work) is that you can access data synchronously, that is, right where and when you need it. Messages avoid deadlocks, but using them means that when you need some data, you have to send a message to ask for it, then go off and do something else until you get your response. The logical structure of a program architected with messages is very different from one architected with locks.

Saturday was a good day: I was finally able to give Ogg Frog the quality time that it needed. As I refactored the code to use ZMessages the deadlocks fell one by one. One last deadlock took a long time to figure out though. One clue was that it only happened on Windows, and never on Mac OS X.

Another clue was that Andy told me that when he was considering porting his program NetPhone from Mac to Windows, he found that the Windows waveOut audio driver interacted in some way with the message queue that the Windows operating system uses to communicate with window threads. He thought that this might make it difficult to get a Windows port of NetPhone working reliably in real-time.

He didn't elaborate on this, but I kept it in mind as I investigated. One of the tools at my disposal was to test in different configuration: on Mac OS X or Windows, by playing Ogg Vorbis or MP3 files, by replacing my audio output function with one that just played silence, or one that played a steady sine-wave tone. I could also write diagnostic messages to a log file.

ZooLib, like most application frameworks, has an "application object" called ZApp that is the first object to be created when the program starts, and that exists for the life of the program. My OFApp object inherits from ZApp, and coordinated communication among the whole program.

At first I had my audio player sending messages to OFApp that reported the amount of time the current track had been playing. OFApp would relay the message to my main window. I found that if I sent OFApp messages regularly, the Windows waveOut driver would eventually stop working; the Windows operating system calls one of my functions when it's done with a chunk of audio data. After just a few seconds, my callback function would just stop getting called.

The solution was to not send the messages to OFApp; instead I'm now using ZAudioPlayer to distribute messages such as timestamps to interested clients. And with that, the last deadlock fell.

It was late by this time. Ogg Frog has a Play Folder function; I set it to play - running on Windows - an album by Lemon Jelly, and went to bed, satisfied at having done a good day's work, and at having finally gotten Ogg Frog unstuck.

[Home | Contact | What's New? | Products | Services | Tips | Mike]