The best kittens, technology, and video games blog in the world.

Monday, June 11, 2007

How the boring stuff gets done in Open Source ? It doesn't !

Pink buttons by kup,kup from flickr (CC-NC-SA)
Writing code is fun. Debugging used to be painful, especially with all the segfaults, but unit tests made it quite fun too, or at least bearable. Not only are these activities fun, there's also an instant payoff - something that didn't work or didn't work right is now doing its job. There's just one issue:

How the hell do people make themselves package software and write documentation ? They simply don't.
It's easy to consider packaging and documenting a non-issue. If you think of the few most popular programs like Apache and Firefox, they have plenty of packagers and documenters. These however are not typical programs. The vast majority of Open Source programs are small, have just a few users, are barely documented, and require manual installation.

Just SourceForge has a whooping 150,442 registered projects, and there's probably an order of magnitude more programs whose authors didn't even bother to publish them on any widely available website. The most popular Linux distribution Ubuntu contains 21,428 packages in all repositories. That's not the number of programs, as many programs are split into multiple packages - like mozilla-firefox-locale-all (which isn't even a separate program, just part of Firefox) which generates 47 packages - mozilla-firefox-locale-pl-pl, mozilla-firefox-locale-ga-ie, mozilla-firefox-locale-ca, mozilla-firefox-locale-sl-si, mozilla-firefox-locale-es-ar, mozilla-firefox-locale-nso, mozilla-firefox-locale-es-es, mozilla-firefox-locale-af, mozilla-firefox-locale-ar, mozilla-firefox-locale-fi-fi, mozilla-firefox-locale-gu-in, mozilla-firefox-locale-he-il, mozilla-firefox-locale-pt-pt, mozilla-firefox-locale-da-dk, mozilla-firefox-locale-fr-fr, mozilla-firefox-locale-hu-hu, mozilla-firefox-locale-eu, mozilla-firefox-locale-zh-tw, mozilla-firefox-locale-sk, mozilla-firefox-locale-en-gb, mozilla-firefox-locale-zu, mozilla-firefox-locale-mn, mozilla-firefox-locale-bg-bg, mozilla-firefox-locale-pt-br, mozilla-firefox-locale-cs-cz, mozilla-firefox-locale-sv-se, mozilla-firefox-locale-hr, mozilla-firefox-locale-ja-jp, mozilla-firefox-locale-el, mozilla-firefox-locale-fy-nl, mozilla-firefox-locale-de-de, mozilla-firefox-locale-ku, mozilla-firefox-locale-tr-tr, mozilla-firefox-locale-nb-no, mozilla-firefox-locale-ru-ru, mozilla-firefox-locale-lt, mozilla-firefox-locale-it-it, mozilla-firefox-locale-pa-in, mozilla-firefox-locale-ro-ro, mozilla-firefox-locale-bn-bd, mozilla-firefox-locale-nn-no, mozilla-firefox-locale-mk-mk, mozilla-firefox-locale-zh-cn, mozilla-firefox-locale-nl-nl, mozilla-firefox-locale-ko, mozilla-firefox-locale-ka-ge, and mozilla-firefox-locale-bn-in. Counting different source packages reduces the count of packaged programs to 12,768.

That's just the most popular Linux distribution. If by "adequate packaging" we expected support for at least the top 5 Linux distros, Windows, and Mac OS X, the number of "adequately packaged" programs would be a really tiny fraction of all Open Source. And documentation ? I don't have any hard numbers but we all know that the majority of software is barely documented, and the existing documentation is usually out of date anyway. The problem of inadequate packaging and documentation affects even some big programs like OpenSolaris and Omniscient Debugger - I'm pretty sure they would be much more popular if only someone bothered to package, document, and advertise them.

Why people don't do it ? It's a lot of hard work. It's unbelievably boring. It brings little short-term results - if someone managed to make themselves do it, most likely the next user would use a different distro, the next update would make the documentation obsolete, and they would give up soon. Unlikely coding these activities aren't even considered cool or respected by most people, so why bother ?

So what can be done ? One thing which alleviates the problem somewhat is Ubuntu and other Debian spin-offs becoming dominant Linux distributions. If they manage to push everything else out of the mainstream (not very likely, but weirder things happened) and stay reasonably compatible with each other, maybe .deb will become the one Linux package format. Unfortunately .deb s are very painful to create, Linux distributions fork faster than they die, and the previous candidate for the one Linux package format .rpm failed to be universally adopted.

Another partial solution are packaging formats on top of whatever distributions are using like RubyGems. They don't exactly mix well with normal distribution code, but at least they are far easier to build than .debs and work on everything including Mac OS X and Windows.

Or someone might finally make packaging "Not Boring", just like Google managed to do with Internet searching and Web email. It's a well-understood problem, so it really shouldn't be that hard to make a single program which can be used to create reasonable packages in 10 or so different formats in just a few minutes for 99% of programs.

As for documentation, we should simply accept that it will never be compete and up to date. One solution is striving to make unit tests readable - they have a much greater chance of being complete and up to date than the docs. Unfortunately I've seen programs in which tests made absolutely zero sense unless you knew the codebase very well. Even better solution is not needing the docs. Do 99% of programs really need detailed installation instructions ? Why can't the Rakefile / Makefile / gemspec / debian/rules / whatever do it all for us ? The only documentation people really desperately need is how to get started, and if getting started is simply and the program is prepackaged it won't be that much. And as long as you follow the conventions, what many programs unfortunately don't do, a lot less documentation is needed.

There are also a few solutions for embedding some of the most vital documentation in the program, in hope of making it less likely to go out of date. One that I particularly like is Ruby optparse and similar packages for other languages - output of --help is one of the most important pieces of documentation, and it's a great idea to make it less likely to go out of date.

Any other ideas ?


Monkeyget said...

Your blog post immediatly made me remember this entry by Eric Lippert : How many Microsoft employees does it take to change a lightbulb?
The first comments are particulary interesting.

pete said...

It doesn't - you're dead right, especially for linux desktops. I'd rather pay a fee and get something that works.


Alan said...

Your definition for "adequate" packaging is a bit excessive. There are tons of Unix-only programs with plenty of documentation and packaging for all major Linux distributions, but since they haven't been ported to Windows and OSX, they don't count in your analysis.

dfdeshom said...

"""There are also a few solutions for embedding some of the most vital documentation in the program, in hope of making it less likely to go out of date.

Python has doctests where code gets embeddded with documentation, a little like this

def f(a,b):
returns a+b. Example:
>> f(1,1)
return a+b

taw said...

Alan: It is not excessive. Unless the program is really inherently Linux-specific like module-init-tools and make no sense under Mac OS X and Windows the fast it doesn't run there is a packaging deficiency. There are very very few programs like that.

taw said...

dfdeshom: Yeah, docstring, rdoc, and so on are great for documenting library interfaces and low-level program internals, but nothing more than that. They cannot be used to write a sensible manual, or a tutorial, or even a high-level description of program architecture.

zenspider said...

(ruby) Packages like my hoe or dr. nic's gembuilder go a long way towards relieving the tedium of packaging and deployment. With hoe you can start entirely new projects with well-defined filesystem structure and be up and running in seconds. After than you fill out a tiny bit of config (what rubyforge project, author name/email, etc) and you can have automatic packaging, deployment, documentation publishing, etc. All for free. Makes deployment a TON less tedius. It'd be nice if there were similar things for other languages but I've not seen them for. Perl prolly came closest last I looked.

taw said...

zenspider: hoe and friends don't help much. They support a single packaging system, which is language-specific, and it's not clear how to customize the stuff generated by them.

What is needed is a single system which can generate reasonable .debs, .rpms, .gems, .msis, and whatnot; language-neutral and customizable. hoe only saves a tiny fraction of the boring work.