taw's blog

Sunday, June 15, 2008

How to create value by blogging

I'm hardly blogging these days, so I though I'd at least open a Twitter account - and here it is, and some stuff I've done recently like London Barcamp 4 and getting OLPC XO-1 laptop weren't blogged at all, just twitted. This is all caused by a certain problem with blogging - people really like to read long blog posts. Just look at the "Popular posts" sidebar - it's long post after another. People even like insanely long blog posts like those by Steve Yegge. But long posts take too much time and energy, so most of the posts written by me and other bloggers are pretty short. So I somehow thought that writing stort blog posts isn't really worthwhile, and tweeted instead.

Today I thought maybe it's time to verify it, so for some hard data I collected statistics from Google Analytics (unique page views) and del.icio.us (number of bookmarks except of mine), and divided my 202 blog posts into ten buckets depending on their plain text size.

And indeed - it seems that almost nobody cares about the short blog posts and the longer the post is the more people read it. The difference is even more pronounced when counting del.icio.us bookmarks. I feel that the number of del.icio.us bookmarks is a much better indicator of post's "value" than page views, as page view is generated before the reader even seen the post, while bookmark is generated only after it was read and decided to be valuable. Search good vs experience good.

Bucket	Posts	Average size	Average page views	Average bookmarks	Average page views per kB	Average bookmarks per kB
1	20	16742	1776	20.0	126	1.4
2	20	6890	1513	11.0	224	1.6
3	21	4590	1036	4.9	245	1.1
4	20	3456	496	3.8	149	1.1
5	20	2803	314	2.3	114	0.9
6	20	2129	301	1.1	146	0.5
7	20	1656	402	1.9	238	1.1
8	21	1188	223	0.2	177	0.2
9	20	789	276	1.2	399	1.7
10	20	421	143	0.2	384	0.5

This confirms my beliefs and disproves the commonly held ADHD theory of blog readers which states that most blog readers have very short attention spans and would much rather look at the kittens. It seems that to the contrary, reader really love long posts. At least my readers. You'll still be getting kittens of course, my blog would look quite empty without them.

On the other hand a completely different picture arises when page views per kB and bookmarks per kB are measured. Bookmarks per kB is pretty flat, while page views per kB is going down fast. So if kBs of text are a good measure of blogger's effort then the best way of generating value is writing tons of stort posts.

I'm so undecided. Is it better to write fewer long posts, many of which would be big hits (relative to the blog popularity of course, this isn't I CAN HAS CHEEZBURGER), or rather many posts which would generate less value per post but more value overall. I'm kinda writing for myself, but I still think if the post would be valuable to the average reader or not before posting it. I should probably simply keep posting instead of thinking too much.

Thursday, June 12, 2008

Bolting Aspect Oriented Programming on top of Python

The bigest difference between native support and bolting things on top of a programming language is that you can only bolt so much before things start to collapse. In C++ even strings, arrays, and hashtables are bolted on - and while they work just fine any interoperability between different libraries using strings, arrays, and hashtables is almost impossible without massive amount of boilerplate code.

In Perl and Python these basic data structures are native and well supported, but the next step of supporting objects is bolted on. So the objects work reasonably well, but metaprogramming with them is very difficult and limited (in Python) or outright impossible in any sane way (in Perl).

Ruby takes a step further and real object-oriented programming is native, so people can bolt other things on top of it like aspect-oriented programming. AOP in Ruby (before_foo, after_bar, alias_method_chain, mixins, magical mixins, many method_missing hacks etc.) works reasonably well, but I wouldn't want to bolt anything on top of that, or the whole thing would fall apart.

This is the problem with bolting stuff on - bolting stuff on in a valid technique (just like design patterns, code generation and other band-aids), and bolted-on stuff like objects in Perl/Python or arrays/strings/hashtables in C++ do work, they're just infinitely less flexible than native types when it comes to further extending.

But I really miss AOP in Python. Multiple inheritance can kinda simulate very weak kind of mixins, but is rather cumbersome to use. I wanted to write a test suite using aspect-oriented mixins, but there were simply so many super(omg, who).made(up, this, syntax) calls that it looked as painful as Java inner classes. So I thought - would it already collapse if I added a very simple AOP support?

It turned out not to be so bad. Here's a distilled example. There's a bunch of classes inheriting from BaseTest. Their setup methods should be called from superclass down to subclass, while their teardown methods should be called from subclass up to superclass. If there are multiple AOP methods on the same level all should be called, in some consistent order (I do alphanumeric, order of definition would be better but Python metaprogramming isn't powerful enough to do that). It's also possible to override parent's AOP methods (you could even compose AOP method override using super if you really needed). Or you could override the whole setup/teardown method if you really needed - this is very flexible.

class BaseTest(object):
  def setup(self):
    aop_call_down(self, 'setup')
  def teardown(self):
    aop_call_up(self, 'teardown')

class WidgetMixin(object):
  def setup_widget(self):
    print "* Setup widget"
  def teardown_widget(self):
    print "* Teardown widget"

class Foo(BaseTest):
  def setup_foo(self):
    print "* Setup foo"
  def teardown_foo(self):
    print "* Teardown foo"

class Bar(WidgetMixin, Foo):
  def setup_bar(self):
    print "* Setup bar"
  def teardown_bar(self):
    print "* Teardown bar"

class Blah(Bar):
  def setup_blah(self):
    print "* Setup blah1"

  def setup_widget(self):
    print "* Setup widget differently"

  def setup_blah2(self):
    print "* Setup blah2"

  def teardown_blah(self):
    print "* Teardown blah1"

  def teardown_blah2(self):
    print "* Teardown blah2"

The output of a = Bar(); a.setup(); a.teardown() is exactly what we would expect:

* Setup foo
* Setup widget
* Setup bar
* Teardown bar
* Teardown widget
* Teardown foo

The more difficult case of b = Blah(); b.setup(); b.teardown() is also handled correctly - notice that setup of widget mixin was overriden:

* Setup foo
* Setup widget differently
* Setup bar
* Setup blah1
* Setup blah2
* Teardown blah2
* Teardown blah1
* Teardown bar
* Teardown widget
* Teardown foo

The code to call make it possible isn't strikingly beautiful but it's not any worse than some of my Django templatetags.

def aop_call_order(obj, prefix):
  already_called = {}
  for cls in reversed(obj.__class__.mro()):
    for name in sorted(dir(cls)):
      if name[0:len(prefix)+1] != prefix + '_':
        continue
      if not already_called.has_key(name):
        yield(name)
      already_called[name] = True

def aop_call_up(obj, prefix):
  for name in reversed(list(aop_call_order(obj, prefix))):
    getattr(obj, name)()

def aop_call_down(obj, prefix):
  for name in aop_call_order(obj, prefix):
    getattr(obj, name)()

aop_call_order returns a list of methods with names like prefix_* defined in obj's ancestor classes in order of Python's multiple inheritance resolution, falling back to alphabetic if they're on the same layer. Overriding a method in subclass doesn't affect the order, making the "Setup widget differently" trick possible. aop_call_down and aop_call_up methods then call these methods in straight or reverse order.

Of course like all other multilayer bolted-on features, it's going to horribly collapse if you use it together with other metaprogramming feature. If you don't like that - switch to Ruby.

Coming up next - bolting closures on top of Fortran.

Saturday, May 10, 2008

Relax, fuel is cheap

As a proud Cornucopian I'm tired of baseless claims of neo-Malthusian peakniks that cheap oil is over and it's time to eat dirt and die.

If you think I'm exaggerating, and haven't been on the Internet for the last five years or so here's a typical example of a Peaknik Doomsdayer:

Peak Oil. It's bigger than terrorism, global warming or genocide. It's the end of your way of life. [...] Which means if you don't live by your farm, no food for you. There won't be much food anyway. [...] 4 billion people will not survive. [...] So what can you do to prevent peak oil? Nothing. Seriously, nothing.

Fortunately these claims don't withstand scrutiny. Fuel is much more affordable than anytime in the history except for 1990s where it was not only very affordable but insanely cheap.

I hereby present my Fuel Affordability Index, which compares fuel affordability to standard of 100.0 in 1975. Fuel affordability calculates how many miles you can go on average salary. To calculate it you multiply new car fuel efficiency in mpg, gdp per capita and divide by crude oil prices. We can dispute stuff like gdp per capita vs median household income, crude oil prices vs retail gas prices, new car vs average car fuel efficiency and so on but they don't fundamentally affect conclusions so I just took whatever was easiest to find. Data is for USA, mostly because I couldn't find historical fuel efficiencies for any other country. I guess the conclusion would be even stronger for EU as European cars are more energy efficient and strong euro makes crude oil cheaper than in US.

Year	Fuel Affordability	GDP per capita	Fuel efficiency	Crude oil prices
1975	100	$19,962	13.5	$47.63
1976	113	$20,826	14.9	$48.36
1977	119	$21,570	15.6	$49.88
1978	140	$22,531	16.9	$48.17
1979	97	$22,987	17.2	$71.96
1980	84	$22,666	20.0	$95.50
1981	105	$23,011	21.4	$82.70
1982	126	$22,350	22.2	$69.33
1983	147	$23,148	22.1	$61.34
1984	168	$24,598	22.4	$58.14
1985	196	$25,386	23.0	$52.56
1986	394	$26,028	23.7	$27.66
1987	342	$26,668	23.8	$32.81
1988	443	$27,519	24.1	$26.45
1989	381	$28,226	23.7	$31.05
1990	315	$28,435	23.3	$37.17
1991	372	$28,011	23.4	$31.15
1992	405	$28,559	23.1	$28.81
1993	493	$28,943	23.5	$24.36
1994	552	$29,744	23.3	$22.19
1995	540	$30,131	23.4	$23.09
1996	465	$30,886	23.3	$27.38
1997	541	$31,891	23.4	$24.40
1998	885	$32,837	23.4	$15.35
1999	662	$33,908	23.0	$20.83
2000	421	$34,770	22.9	$33.39
2001	517	$34,701	23.0	$27.29
2002	536	$34,931	23.1	$26.61
2003	460	$35,479	23.2	$31.62
2004	356	$36,433	23.1	$41.84
2005	287	$37,206	23.5	$53.77
2006	257	$37,928	23.3	$60.73
2007	244	$38,340	23.4	$64.92

Notes: Crude oil prices adjusted to 2007 dollars. GDP per capita in 2000 dollars. Different basis doesn't affect the results as only ratio is taken. Fuel efficiency is combined urban+highway, for all cars except trucks (so if I understand it correctly without SUVs too).

As you can see fuel is much more affordable than in 1970s or early 1980s. So as the civilization in 1970s very much existed, it will continue at current fuel prices, or even at prices significantly higher than current. One thing I expect to start happening about now is further increase in average car fuel economy - as soon as fuel became cheap in mid 1980s cars stopped improving, but hybrids are much better than 23.4 - the popular Toyota Prius has combined mpg of 46 - almost double the current average.

Even if people keep buying the same cars and the economy stays stagnant fuel would have to become 2.44 times as expensive in order to bring fuel affordability back to 1970s level (which if you're old enough to remember, weren't end of civilization). If people start buying hybrids (and they will) and economy grows at 3% a year (and it will) for the next 10 years, even $420 barrel won't reduce fuel affordability to below-1970s level.

Relax, fuel is cheap.

Friday, March 21, 2008

WTF - Changing keymaps with a hex editor

I had a sudden attack of nostalgia today so I grabbed a few random Nintendo ROMs and tried to play them. Surprisingly while hardware SNES had no problem with NES games, SNES emulators don't like NES ROMs claiming they're corrupt. I think they should at least say something like "This looks like a NES ROM not SNES ROM, please use another program" but that's a subject for another rant.

I got FCE Ultra which is a proper NES emulator. The only problem was completely absurd keymapping, AWZS being arrows, and keypad 2/3 being buttons. It was completely unusable on either laptop keyboard or external keyboard and it wasn't customizable. I tried one more emulator (nestra) but there was no sound so I had to go back to FCE Ultra. Not a big deal I thought - I have sources, so apt-get source fceu; sudo apt-get build-dep fceu and let's see how the input is handled. I found data structure defining keymaps in no time, changed it to cursors+ZX, recompiled and ... it still used the old keymap.

No need to panic I thought, maybe there's another keymap somewhere. There wasn't. So maybe make dependencies are broken and I need to make clean first... still didn't work. I even started suspecting that libtool or some other part of the build system might store old object files somewhere. No worries here, let's just get fresh sources, patch input.c, recompile and ... back to the old keymap.

Now that was weird. There was definitely no other keymap, debug printfs in input.c worked, and all the functions I expected were called. A few more printfs - keymap data structure contained 97, 119, 122, 115 (awzs) instead of arrow keycodes. That was a major WTF. Obviously something must have modified the keymap, but how did it know the old one?

The solution wasn't that far away - FCE Ultra concatenates all configuration data structures (keymaps, sound settings, palletes, GUI config and everything else) into one huge structure which is saved to disk and then reloaded overwriting program's manually defined data structures. It's not even serialized in any way, it's pure binary data the way compiler arranges it. Only a C programmer might have done something like that. People who program in literally anything else - even Java - would either do text/XML serialization or wouldn't bother at all. What was the thinking process of the guy who wrote that code? My best guess is "Maybe someone will want to change keymaps with a hex editor".

Sunday, March 02, 2008

Practical Ruby Projects review

A few weeks ago I got a review copy of "Practical Ruby Projects: Ideas for the Eclectic Programmers" by Topher Cyll. The main part of the book consists of eight thematic chapters: MIDI music, SVG animation, pocket change problem, turn-based strategy engine, turn-based strategy GUI in Cocoa, genetic algorithms, implementing Lisp in Ruby (what probably got me the review copy), and parsing with RParsec. But don't worry about that, they are only pretexts for demonstrating various genuine issues that you will stumble upon in your Ruby coding and various interesting techniques. In a way the book reads a lot like a typical programming blog, with all the "I had this strange problem while coding something, and I solved it in this incredibly cool way, I hope it's useful to you too" stuff.

The book is very high on code and pretty low on bullshit and beginner filler (which unfortunately plagues so many books these days), so it won't be wasting your time. I was surprised by how many solutions presented in the book were almost identical to what I've done in my projects, like world map implementation in the book's Cocoa app and my jrpg, which uses Python and PyGame/SDL but is surprisingly similar in structure. The part that was most interesting to me was the last chapter with RParsec tutorial, but the book was overall very enjoyable to read.

If you want me to review any other good books, go ahead and send them ;-)

Sunday, February 17, 2008

I'm totally out of shape

I've just found a DDR machine (of the good Euromix 2 kind) near the place I live, and I barely managed to do what a couple of years ago would be just a warmup - one Normal nonstop plus three 8/9-feeters (today it was Crash, Candy, and Tsugaru).

I should start attending a gym or something like that or it will only go down from here.

Monday, January 28, 2008

PageRank - a new addition to your Data Processing Toolkit

Back when I was at Universität des Saarlandes we had a great seminar at MPII. It was called "Data Processing Tips and Tricks" and covered some important data processing techniques. These techniques varied a lot. What they had in common was their universality - you could throw pretty much any data at them and extract something. And by preprocessing your data differently, and postprocessing the results differently, you could get loads of interesting things without inventing any new algorithms.

Here's a partial list of the classic universal data processing methods, in no particular order:

Bayesian statistics
EM algorithm
Wavelets
Levenberg-Marquardt optimization
Lloyd clustering
Karhunen-Loeve Transform (aka Principal Component Analysis)
Singular Value Decomposition
multi-dimensional scaling
Algebraic reconstruction technique
Support Vector Machines
Graph Cut Optimization
Level-Set Methods
RANSAC
Neural Networks
Hidden Markov Models
Regular expressions

By simply being aware of their existence you greatly increase your chances of solving really big problems you'll be facing. For example naive Bayesian statistics was famously used for fighting spam, and in spite of its striking simplicity is much more effective than all the custom and complicated methods that preceded it.

In the last few years the data processing toolkit got one a new tool - PageRank. Pretty much everybody knows how it works for scoring websites, but the algorithm is capable of much more than that. One great example is extracting keywords from documents. It has nothing to do with the original problem of website scoring, but if you treat words as nodes (websites), create links between words that occur close to each other, and run PageRank on such a graph, you get very decent keywords. Of course you might want to add some pre- and post-processing to improve keyword quality (obviously removing HTML tags, also stemming, removing stopwords, weighting words by part of speech or whatever you feel like doing), but so does Google in determining pages' scores. And I bet you expected keyword extractors to either actually understand what's written (not possible yet) or to simply count number of occurences (really horrible results).

You can use PageRank to asses importance of countries in international trade, importance of people in organization's communication flow, and many other problems. Or you could simply throw arbitrary graphs at PageRank, look at the results and simply guess what they mean. Perhaps that will be enough to solve the problem you've been thinking about for such a long time. If not, you still have two dozen of other universally applicable techniques to choose from.

Saturday, January 26, 2008

Keyboard layout should not be a global setting

Most computers these days are laptops. Their computing power got quite decent, and they're even becoming reasonably powerful for moderate gaming. One of the major problems left is typing on them. Laptop keyboards, especially in smaller laptops, are tiny, have very small keys, no numeric keypad and painfully unergonomic shape which forces you to keep your hands in unnatural position. As keyboard and screen are attached to each other there's no way to make it comfortable for both your eyes and your hands. To make matters worse recently many laptops started to include a big touchpad which doubles as a mouse button, so when you try to type, and you have to keep your hands very close to each other because the keyboard is so small, you're very likely to accidentally "press mouse button" by touching the touchpad. Basically they're completely unsuitable for touch typing, and they will forever stay this way, because it's impossible to build a decent keyboard that fits in laptop form factor. That doesn't mean that all of them are equally bad, the worst one I've seen so far was Macbook's, and some in bigger laptops are only annoying instead of being actively painful.

This all means that unless the only thing you type are Google search queries, you need a real keyboard for your computer in addition to the internal keyboard. Most people don't seem to care about this, but I really like Dvorak layout. And here lies the problem because in every operating system I've ever seen keyboard layout is a global setting, not per-device setting. I want to touch type in Dvorak on external keyboard, but as touch typing on laptop keyboard is not possible I'd prefer it to stay QWERTY, so I can at least see what key I'm pressing. However as keyboard layout is global rather than per-device setting, I have to manually switch it every time I attach or detach external keyboard.

Pressing keyboard layout switcher a couple times a day is maybe not the worst usability problem out there, but could KDE/GNOME developers please improve it ? That would be really really great.

Wednesday, January 23, 2008

Really strange quirk of Ruby and Perl regular expressions

I've found a weird quirk of Ruby's regular expression engine. The same quirk is present in Ruby 1.8.6, Ruby 1.9.0, and Perl 5.8.8, but it is not present in Python 2.5.1. I'm going to let you decide if it's a bug or not.

I tried to replace all space at the end of a string by a single newline. The correct way to do it is str.sub(/\s*\Z/, "\n"). However I've done str.gsub(/\s*\Z/, "\n") instead. String has only one end-of-string, so there should be no way String#gsub could possibly match twice. But it does - the result is two newlines if there was any whitespace at the end, or one newline if there wasn't. I kinda see how it might be getting these results from implementation point of view, and it's not a big deal because there's a simple workaround of replacing #gsub by #sub, but it doesn't make much semantic sense to me.

Here are a few code snippets in Ruby, Perl and Python, with "foo" instead of whitespace for better readability.

# Ruby
"f".gsub(/o*\Z/, "o") # => "fo"
"fo".gsub(/o*\Z/, "o") # => "foo"
"foo".gsub(/o*\Z/, "o") # => "foo"
"fooo".gsub(/o*\Z/, "o") # => "foo"
"f".sub(/o*\Z/, "o") # => "fo"
"fo".sub(/o*\Z/, "o") # => "fo"
"foo".sub(/o*\Z/, "o") # => "fo"
"fooo".sub(/o*\Z/, "o") # => "fo"
"f".gsub(/o*\Z/, "x") # => "fx"
"fo".gsub(/o*\Z/, "x") # => "fxx"
"foo".gsub(/o*\Z/, "x") # => "fxx"
"fooo".gsub(/o*\Z/, "x") # => "fxx"

# Perl
perl -le '$_="f"; s/o*$/o/g; print $_' # => fo
perl -le '$_="fo"; s/o*$/o/g; print $_' # => foo
perl -le '$_="foo"; s/o*$/o/g; print $_' # => foo
perl -le '$_="fooo"; s/o*$/o/g; print $_' # => foo

# Python
re.compile(r'o*$').sub("o", "f") # => 'fo'
re.compile(r'o*$').sub("o", "fo") # => 'fo'
re.compile(r'o*$').sub("o", "foo") # => 'fo'
re.compile(r'o*$').sub("o", "fooo") # => 'fo'

For all you Ruby/Perl programmers out there, Python sub is equivalent of gsub or s///g not sub or s///.

Which behaviour makes more sense ? I think Python's much more intuitive. Should we treat it as a bug in Ruby/Perl and fix it, or accept it as an unintended feature ?

Sunday, January 20, 2008

My poor bottom left 6

My tooth hurt a bit in the last week, and it got really bad yesterday, so I called some 24h dental service. It wasn't as 24h as they claimed and I was only able to get an appointment for the next day. The bad news is - root canal treatment was required. The even worse news - it was damn 700 quid for two visits. Here goes the new computer, at least for now. It's seriously insanely expensive compared to dentistry in Poland, whene plain checkup is about 10 GBP, and root canal treatment is about 50 GBP - just Google for prices if you don't believe me. If I actually knew in advance it would cost that much I'd probably just book a flight. Too bad you cannot schedule your dental problems.