There's something I don't know about software development, and I'd like to know. Everybody believes Test-Driven Development is the right way to build software, and it's very easy to develop libraries this way, and still reasonably easy to develop applications, but how does one even test glue code ?
The software which got me thinking about this is my iPod-last.fm bridge. It extracts song statistics from an iPod and submits them to last.fm.
It's a bit over 200 lines of code, and it had a few bugs:
- The magic constant to convert time was off by an hour. My iPod had DST settings wrong, and I miscalculated the constant so the result was right for my iPod.
- It assumed record size was 16-bytes instead of looking at the headers. Older iPods had 12-byte records.
- It expected upper-case MD5 in response from last.fm. last.fm switched to sending lower-case MD5s later.
- Documentation didn't state that Ruby version 1.8.3 or newer is required, and it won't work with Ruby 1.8.2.
The script is very simple glue code, and it glued things on which it was tested perfectly - Ruby 1.8.4, an iPod with 16-byte records and broken DST settings, whichever version of software last.fm used when I coded the script.
I don't see how it would be possible to test such script. Building simple mock iPod or mock last.fm server wouldn't find any of the bugs, and complex mocks would be far more complex than the script itself, and most likely would contain even more bugs.
I didn't think about testing it with Ruby 1.8.2, as from Linux (crontabbed apt-get upgrade) perspective this version was ancient, it requires effort to install multiple minor versions of Ruby on Ubuntu, and I didn't expect any problems there. After discovering a few more minor incompatibilities, I automatically run unit tests for all my Ruby code on 1.8.4, 1.8.5, 1.8.6, and 1.9. I don't think I need to support 1.8.2 or 1.8.3, but I'd like to be able to know what is supported and what isn't.
I somehow don't see code review finding any of these problems. If I was doing code review, I think I'd be unlikely to question 2082848400 time conversion magic, 16-byte record size, upper case assumption in MD5 rensponse format regexp, or to ask if it would work with very old versions of Ruby would.
Not a single one of this bug could be found by static typing, no matter how strict, or by avoiding side effects.
Releasing the code found 3 of 4 bugs, and they were fixed in a few minutes at most each. Nobody cared enough about time being off by 1 hour to report it, and I found it on my own while playing with iPod settings.
I think it's a typical case for glue code. Very simple code works with some complex systems, and makes assumptions about them, some of which may later prove unwarranted. Testing, code review, static typing and other standard recipes for reducing bug counts don't seem to be working. Most likely they would simply greatly increase amount of code and work. Releasing the code in the wild seems to be very effective (but it failed to find the time bug).
Is it really the best solution we know ?
6 comments:
I certainly know the feeling. I've written a program to sync the music in an xmms2 collection with my iAudio X5L running Rockbox. The bulk of the code is a parser for Rockbox's tag database which I have a test set for but those tests are built around by X5L with big endian encoding, and the syncer is built around my xmms2 media library, so all the little hacks and workarounds I had to throw in to please fat32 aren't necessarily going to be enough for someone else.
You are rising an interesting point.
When I am writing glue code, I perform system or integration tests and run them against "the real thing". Unit tests are nice but cannot test the behaviour of multiple components together. In my case, this means testing against a database and it is feasible.
I guess it gets problematic when pieces of hardware come into the process but you can circumvent the need for buying an instance of each released iPod versioin.
I looked at your get_play_counts.rb and saw that you have structured this code well so I would suggest to test your methods with "real world data":
You could record the behaviour of multiple iPods (or rather multiple versions of iTunesDB files) and the expected data. You could do this manually or read out the data with your program and check its correctness.
I know this feeling. I write programs that integrate with QuickBooks and I've always felt kinda guilty not making unit tests. But how can I make a unit test for a function create_invoice()?
This is why mock objects were invented. You mock the API of each of the things you're gluing together and then run your tests against that.
You can tweak what each of the mocked APIs send out / accept and see how your code handles it.
*shrug* seems pretty obvious to me.
This is a tough one. Sometimes the most pragmatic choice is to just test at the system level, but often you can unit test if you look at your glue and try to figure out the "computational core" of your code. Ask what you're trying to do. For instance if you're writing a mail server, it may be all API calls but there is this inner loop: receive a message, transform it, and send.
Once you figure out what your core is, mock out the rest. Remember that you don't have to mock your vendor's API. Write a wrapper that is convenient for YOU. You're not going to use the whole API. Write wrappers that give you a simplified interface to what you need.
I have an example of this in Working Effectively with Legacy Code. I forget the chapter title, but it is called My Application is All API Calls.
masukomi: Mock last.fm server would be doable. Simple mock iPod would probably be more complicated than the whole script. Mock Ruby installation doesn't make much sense.
In any case, even if I created mocks for last.fm server and iPod, I would almost certainly repeat the assumptions - upper case MD5s, 16-byte records, and broken timezone. So it wouldn't help fix the code.
Post a Comment