The best kittens, technology, and video games blog in the world.

Sunday, June 15, 2008

How to create value by blogging

He's so cute when he's angry! by rockygirl05 from flickr (CC-NC)

I'm hardly blogging these days, so I though I'd at least open a Twitter account - and here it is, and some stuff I've done recently like London Barcamp 4 and getting OLPC XO-1 laptop weren't blogged at all, just twitted. This is all caused by a certain problem with blogging - people really like to read long blog posts. Just look at the "Popular posts" sidebar - it's long post after another. People even like insanely long blog posts like those by Steve Yegge. But long posts take too much time and energy, so most of the posts written by me and other bloggers are pretty short. So I somehow thought that writing stort blog posts isn't really worthwhile, and tweeted instead.

Today I thought maybe it's time to verify it, so for some hard data I collected statistics from Google Analytics (unique page views) and del.icio.us (number of bookmarks except of mine), and divided my 202 blog posts into ten buckets depending on their plain text size.

And indeed - it seems that almost nobody cares about the short blog posts and the longer the post is the more people read it. The difference is even more pronounced when counting del.icio.us bookmarks. I feel that the number of del.icio.us bookmarks is a much better indicator of post's "value" than page views, as page view is generated before the reader even seen the post, while bookmark is generated only after it was read and decided to be valuable. Search good vs experience good.

BucketPostsAverage sizeAverage page viewsAverage bookmarksAverage page views per kBAverage bookmarks per kB
12016742177620.01261.4
2206890151311.02241.6
321459010364.92451.1
42034564963.81491.1
52028033142.31140.9
62021293011.11460.5
72016564021.92381.1
82111882230.21770.2
9207892761.23991.7
10204211430.23840.5


This confirms my beliefs and disproves the commonly held ADHD theory of blog readers which states that most blog readers have very short attention spans and would much rather look at the kittens. It seems that to the contrary, reader really love long posts. At least my readers. You'll still be getting kittens of course, my blog would look quite empty without them.

On the other hand a completely different picture arises when page views per kB and bookmarks per kB are measured. Bookmarks per kB is pretty flat, while page views per kB is going down fast. So if kBs of text are a good measure of blogger's effort then the best way of generating value is writing tons of stort posts.

I'm so undecided. Is it better to write fewer long posts, many of which would be big hits (relative to the blog popularity of course, this isn't I CAN HAS CHEEZBURGER), or rather many posts which would generate less value per post but more value overall. I'm kinda writing for myself, but I still think if the post would be valuable to the average reader or not before posting it. I should probably simply keep posting instead of thinking too much.

Thursday, June 12, 2008

Bolting Aspect Oriented Programming on top of Python

What's this? by Steffe from flickr (CC-NC-SA)
The bigest difference between native support and bolting things on top of a programming language is that you can only bolt so much before things start to collapse. In C++ even strings, arrays, and hashtables are bolted on - and while they work just fine any interoperability between different libraries using strings, arrays, and hashtables is almost impossible without massive amount of boilerplate code.

In Perl and Python these basic data structures are native and well supported, but the next step of supporting objects is bolted on. So the objects work reasonably well, but metaprogramming with them is very difficult and limited (in Python) or outright impossible in any sane way (in Perl).

Ruby takes a step further and real object-oriented programming is native, so people can bolt other things on top of it like aspect-oriented programming. AOP in Ruby (before_foo, after_bar, alias_method_chain, mixins, magical mixins, many method_missing hacks etc.) works reasonably well, but I wouldn't want to bolt anything on top of that, or the whole thing would fall apart.

This is the problem with bolting stuff on - bolting stuff on in a valid technique (just like design patterns, code generation and other band-aids), and bolted-on stuff like objects in Perl/Python or arrays/strings/hashtables in C++ do work, they're just infinitely less flexible than native types when it comes to further extending.

But I really miss AOP in Python. Multiple inheritance can kinda simulate very weak kind of mixins, but is rather cumbersome to use. I wanted to write a test suite using aspect-oriented mixins, but there were simply so many super(omg, who).made(up, this, syntax) calls that it looked as painful as Java inner classes. So I thought - would it already collapse if I added a very simple AOP support?

It turned out not to be so bad. Here's a distilled example. There's a bunch of classes inheriting from BaseTest. Their setup methods should be called from superclass down to subclass, while their teardown methods should be called from subclass up to superclass. If there are multiple AOP methods on the same level all should be called, in some consistent order (I do alphanumeric, order of definition would be better but Python metaprogramming isn't powerful enough to do that). It's also possible to override parent's AOP methods (you could even compose AOP method override using super if you really needed). Or you could override the whole setup/teardown method if you really needed - this is very flexible.

class BaseTest(object):
def setup(self):
aop_call_down(self, 'setup')
def teardown(self):
aop_call_up(self, 'teardown')

class WidgetMixin(object):
def setup_widget(self):
print "* Setup widget"
def teardown_widget(self):
print "* Teardown widget"

class Foo(BaseTest):
def setup_foo(self):
print "* Setup foo"
def teardown_foo(self):
print "* Teardown foo"

class Bar(WidgetMixin, Foo):
def setup_bar(self):
print "* Setup bar"
def teardown_bar(self):
print "* Teardown bar"

class Blah(Bar):
def setup_blah(self):
print "* Setup blah1"

def setup_widget(self):
print "* Setup widget differently"

def setup_blah2(self):
print "* Setup blah2"

def teardown_blah(self):
print "* Teardown blah1"

def teardown_blah2(self):
print "* Teardown blah2"


The output of a = Bar(); a.setup(); a.teardown() is exactly what we would expect:
* Setup foo
* Setup widget
* Setup bar
* Teardown bar
* Teardown widget
* Teardown foo


The more difficult case of b = Blah(); b.setup(); b.teardown() is also handled correctly - notice that setup of widget mixin was overriden:
* Setup foo
* Setup widget differently
* Setup bar
* Setup blah1
* Setup blah2
* Teardown blah2
* Teardown blah1
* Teardown bar
* Teardown widget
* Teardown foo


The code to call make it possible isn't strikingly beautiful but it's not any worse than some of my Django templatetags.

def aop_call_order(obj, prefix):
already_called = {}
for cls in reversed(obj.__class__.mro()):
for name in sorted(dir(cls)):
if name[0:len(prefix)+1] != prefix + '_':
continue
if not already_called.has_key(name):
yield(name)
already_called[name] = True

def aop_call_up(obj, prefix):
for name in reversed(list(aop_call_order(obj, prefix))):
getattr(obj, name)()

def aop_call_down(obj, prefix):
for name in aop_call_order(obj, prefix):
getattr(obj, name)()


aop_call_order returns a list of methods with names like prefix_* defined in obj's ancestor classes in order of Python's multiple inheritance resolution, falling back to alphabetic if they're on the same layer. Overriding a method in subclass doesn't affect the order, making the "Setup widget differently" trick possible. aop_call_down and aop_call_up methods then call these methods in straight or reverse order.

Of course like all other multilayer bolted-on features, it's going to horribly collapse if you use it together with other metaprogramming feature. If you don't like that - switch to Ruby.

Coming up next - bolting closures on top of Fortran.

Saturday, May 10, 2008

Relax, fuel is cheap

Photo of my cat Cloud, for no particular reason (public domain)
As a proud Cornucopian I'm tired of baseless claims of neo-Malthusian peakniks that cheap oil is over and it's time to eat dirt and die.

If you think I'm exaggerating, and haven't been on the Internet for the last five years or so here's a typical example of a Peaknik Doomsdayer:

Peak Oil. It's bigger than terrorism, global warming or genocide. It's the end of your way of life. [...] Which means if you don't live by your farm, no food for you. There won't be much food anyway. [...] 4 billion people will not survive. [...] So what can you do to prevent peak oil? Nothing. Seriously, nothing.

Fortunately these claims don't withstand scrutiny. Fuel is much more affordable than anytime in the history except for 1990s where it was not only very affordable but insanely cheap.

I hereby present my Fuel Affordability Index, which compares fuel affordability to standard of 100.0 in 1975. Fuel affordability calculates how many miles you can go on average salary. To calculate it you multiply new car fuel efficiency in mpg, gdp per capita and divide by crude oil prices. We can dispute stuff like gdp per capita vs median household income, crude oil prices vs retail gas prices, new car vs average car fuel efficiency and so on but they don't fundamentally affect conclusions so I just took whatever was easiest to find. Data is for USA, mostly because I couldn't find historical fuel efficiencies for any other country. I guess the conclusion would be even stronger for EU as European cars are more energy efficient and strong euro makes crude oil cheaper than in US.

YearFuel AffordabilityGDP per capitaFuel efficiencyCrude oil prices
1975100$19,96213.5$47.63
1976113$20,82614.9$48.36
1977119$21,57015.6$49.88
1978140$22,53116.9$48.17
197997$22,98717.2$71.96
198084$22,66620.0$95.50
1981105$23,01121.4$82.70
1982126$22,35022.2$69.33
1983147$23,14822.1$61.34
1984168$24,59822.4$58.14
1985196$25,38623.0$52.56
1986394$26,02823.7$27.66
1987342$26,66823.8$32.81
1988443$27,51924.1$26.45
1989381$28,22623.7$31.05
1990315$28,43523.3$37.17
1991372$28,01123.4$31.15
1992405$28,55923.1$28.81
1993493$28,94323.5$24.36
1994552$29,74423.3$22.19
1995540$30,13123.4$23.09
1996465$30,88623.3$27.38
1997541$31,89123.4$24.40
1998885$32,83723.4$15.35
1999662$33,90823.0$20.83
2000421$34,77022.9$33.39
2001517$34,70123.0$27.29
2002536$34,93123.1$26.61
2003460$35,47923.2$31.62
2004356$36,43323.1$41.84
2005287$37,20623.5$53.77
2006257$37,92823.3$60.73
2007244$38,34023.4$64.92


Notes: Crude oil prices adjusted to 2007 dollars. GDP per capita in 2000 dollars. Different basis doesn't affect the results as only ratio is taken. Fuel efficiency is combined urban+highway, for all cars except trucks (so if I understand it correctly without SUVs too).

As you can see fuel is much more affordable than in 1970s or early 1980s. So as the civilization in 1970s very much existed, it will continue at current fuel prices, or even at prices significantly higher than current. One thing I expect to start happening about now is further increase in average car fuel economy - as soon as fuel became cheap in mid 1980s cars stopped improving, but hybrids are much better than 23.4 - the popular Toyota Prius has combined mpg of 46 - almost double the current average.

Even if people keep buying the same cars and the economy stays stagnant fuel would have to become 2.44 times as expensive in order to bring fuel affordability back to 1970s level (which if you're old enough to remember, weren't end of civilization). If people start buying hybrids (and they will) and economy grows at 3% a year (and it will) for the next 10 years, even $420 barrel won't reduce fuel affordability to below-1970s level.

Relax, fuel is cheap.

Friday, March 21, 2008

WTF - Changing keymaps with a hex editor

Maine coon kittens by eleda from flickr (CC-NC)

I had a sudden attack of nostalgia today so I grabbed a few random Nintendo ROMs and tried to play them. Surprisingly while hardware SNES had no problem with NES games, SNES emulators don't like NES ROMs claiming they're corrupt. I think they should at least say something like "This looks like a NES ROM not SNES ROM, please use another program" but that's a subject for another rant.

I got FCE Ultra which is a proper NES emulator. The only problem was completely absurd keymapping, AWZS being arrows, and keypad 2/3 being buttons. It was completely unusable on either laptop keyboard or external keyboard and it wasn't customizable. I tried one more emulator (nestra) but there was no sound so I had to go back to FCE Ultra. Not a big deal I thought - I have sources, so apt-get source fceu; sudo apt-get build-dep fceu and let's see how the input is handled. I found data structure defining keymaps in no time, changed it to cursors+ZX, recompiled and ... it still used the old keymap.

No need to panic I thought, maybe there's another keymap somewhere. There wasn't. So maybe make dependencies are broken and I need to make clean first... still didn't work. I even started suspecting that libtool or some other part of the build system might store old object files somewhere. No worries here, let's just get fresh sources, patch input.c, recompile and ... back to the old keymap.

Now that was weird. There was definitely no other keymap, debug printfs in input.c worked, and all the functions I expected were called. A few more printfs - keymap data structure contained 97, 119, 122, 115 (awzs) instead of arrow keycodes. That was a major WTF. Obviously something must have modified the keymap, but how did it know the old one?

The solution wasn't that far away - FCE Ultra concatenates all configuration data structures (keymaps, sound settings, palletes, GUI config and everything else) into one huge structure which is saved to disk and then reloaded overwriting program's manually defined data structures. It's not even serialized in any way, it's pure binary data the way compiler arranges it. Only a C programmer might have done something like that. People who program in literally anything else - even Java - would either do text/XML serialization or wouldn't bother at all. What was the thinking process of the guy who wrote that code? My best guess is "Maybe someone will want to change keymaps with a hex editor".

Sunday, March 02, 2008

Practical Ruby Projects review

Foto Kotki by RussianA from Wikimedia Commons (CC-BY)

A few weeks ago I got a review copy of "Practical Ruby Projects: Ideas for the Eclectic Programmers" by Topher Cyll. The main part of the book consists of eight thematic chapters: MIDI music, SVG animation, pocket change problem, turn-based strategy engine, turn-based strategy GUI in Cocoa, genetic algorithms, implementing Lisp in Ruby (what probably got me the review copy), and parsing with RParsec. But don't worry about that, they are only pretexts for demonstrating various genuine issues that you will stumble upon in your Ruby coding and various interesting techniques. In a way the book reads a lot like a typical programming blog, with all the "I had this strange problem while coding something, and I solved it in this incredibly cool way, I hope it's useful to you too" stuff.

The book is very high on code and pretty low on bullshit and beginner filler (which unfortunately plagues so many books these days), so it won't be wasting your time. I was surprised by how many solutions presented in the book were almost identical to what I've done in my projects, like world map implementation in the book's Cocoa app and my jrpg, which uses Python and PyGame/SDL but is surprisingly similar in structure. The part that was most interesting to me was the last chapter with RParsec tutorial, but the book was overall very enjoyable to read.

If you want me to review any other good books, go ahead and send them ;-)

Sunday, February 17, 2008

I'm totally out of shape

Let's dance! by Tscherno from flickr (CC-BY)

I've just found a DDR machine (of the good Euromix 2 kind) near the place I live, and I barely managed to do what a couple of years ago would be just a warmup - one Normal nonstop plus three 8/9-feeters (today it was Crash, Candy, and Tsugaru).

I should start attending a gym or something like that or it will only go down from here.

Monday, January 28, 2008

PageRank - a new addition to your Data Processing Toolkit

G0321 by Piez from flickr (CC-BY)

Back when I was at Universität des Saarlandes we had a great seminar at MPII. It was called "Data Processing Tips and Tricks" and covered some important data processing techniques. These techniques varied a lot. What they had in common was their universality - you could throw pretty much any data at them and extract something. And by preprocessing your data differently, and postprocessing the results differently, you could get loads of interesting things without inventing any new algorithms.

Here's a partial list of the classic universal data processing methods, in no particular order:

  • Bayesian statistics
  • EM algorithm
  • Wavelets
  • Levenberg-Marquardt optimization
  • Lloyd clustering
  • Karhunen-Loeve Transform (aka Principal Component Analysis)
  • Singular Value Decomposition
  • multi-dimensional scaling
  • Algebraic reconstruction technique
  • Support Vector Machines
  • Graph Cut Optimization
  • Level-Set Methods
  • RANSAC
  • Neural Networks
  • Hidden Markov Models
  • Regular expressions

By simply being aware of their existence you greatly increase your chances of solving really big problems you'll be facing. For example naive Bayesian statistics was famously used for fighting spam, and in spite of its striking simplicity is much more effective than all the custom and complicated methods that preceded it.

In the last few years the data processing toolkit got one a new tool - PageRank. Pretty much everybody knows how it works for scoring websites, but the algorithm is capable of much more than that. One great example is extracting keywords from documents. It has nothing to do with the original problem of website scoring, but if you treat words as nodes (websites), create links between words that occur close to each other, and run PageRank on such a graph, you get very decent keywords. Of course you might want to add some pre- and post-processing to improve keyword quality (obviously removing HTML tags, also stemming, removing stopwords, weighting words by part of speech or whatever you feel like doing), but so does Google in determining pages' scores. And I bet you expected keyword extractors to either actually understand what's written (not possible yet) or to simply count number of occurences (really horrible results).

You can use PageRank to asses importance of countries in international trade, importance of people in organization's communication flow, and many other problems. Or you could simply throw arbitrary graphs at PageRank, look at the results and simply guess what they mean. Perhaps that will be enough to solve the problem you've been thinking about for such a long time. If not, you still have two dozen of other universally applicable techniques to choose from.

Saturday, January 26, 2008

Keyboard layout should not be a global setting

mac kitty by atomicshark from flickr (CC-NC-SA)

Most computers these days are laptops. Their computing power got quite decent, and they're even becoming reasonably powerful for moderate gaming. One of the major problems left is typing on them. Laptop keyboards, especially in smaller laptops, are tiny, have very small keys, no numeric keypad and painfully unergonomic shape which forces you to keep your hands in unnatural position. As keyboard and screen are attached to each other there's no way to make it comfortable for both your eyes and your hands. To make matters worse recently many laptops started to include a big touchpad which doubles as a mouse button, so when you try to type, and you have to keep your hands very close to each other because the keyboard is so small, you're very likely to accidentally "press mouse button" by touching the touchpad. Basically they're completely unsuitable for touch typing, and they will forever stay this way, because it's impossible to build a decent keyboard that fits in laptop form factor. That doesn't mean that all of them are equally bad, the worst one I've seen so far was Macbook's, and some in bigger laptops are only annoying instead of being actively painful.

This all means that unless the only thing you type are Google search queries, you need a real keyboard for your computer in addition to the internal keyboard. Most people don't seem to care about this, but I really like Dvorak layout. And here lies the problem because in every operating system I've ever seen keyboard layout is a global setting, not per-device setting. I want to touch type in Dvorak on external keyboard, but as touch typing on laptop keyboard is not possible I'd prefer it to stay QWERTY, so I can at least see what key I'm pressing. However as keyboard layout is global rather than per-device setting, I have to manually switch it every time I attach or detach external keyboard.

Pressing keyboard layout switcher a couple times a day is maybe not the worst usability problem out there, but could KDE/GNOME developers please improve it ? That would be really really great.

Wednesday, January 23, 2008

Really strange quirk of Ruby and Perl regular expressions

Sage's Kittens by T. Keller from flickr (CC-BY)

I've found a weird quirk of Ruby's regular expression engine. The same quirk is present in Ruby 1.8.6, Ruby 1.9.0, and Perl 5.8.8, but it is not present in Python 2.5.1. I'm going to let you decide if it's a bug or not.

I tried to replace all space at the end of a string by a single newline. The correct way to do it is str.sub(/\s*\Z/, "\n"). However I've done str.gsub(/\s*\Z/, "\n") instead. String has only one end-of-string, so there should be no way String#gsub could possibly match twice. But it does - the result is two newlines if there was any whitespace at the end, or one newline if there wasn't. I kinda see how it might be getting these results from implementation point of view, and it's not a big deal because there's a simple workaround of replacing #gsub by #sub, but it doesn't make much semantic sense to me.

Here are a few code snippets in Ruby, Perl and Python, with "foo" instead of whitespace for better readability.

# Ruby
"f".gsub(/o*\Z/, "o") # => "fo"
"fo".gsub(/o*\Z/, "o") # => "foo"
"foo".gsub(/o*\Z/, "o") # => "foo"
"fooo".gsub(/o*\Z/, "o") # => "foo"
"f".sub(/o*\Z/, "o") # => "fo"
"fo".sub(/o*\Z/, "o") # => "fo"
"foo".sub(/o*\Z/, "o") # => "fo"
"fooo".sub(/o*\Z/, "o") # => "fo"
"f".gsub(/o*\Z/, "x") # => "fx"
"fo".gsub(/o*\Z/, "x") # => "fxx"
"foo".gsub(/o*\Z/, "x") # => "fxx"
"fooo".gsub(/o*\Z/, "x") # => "fxx"

# Perl
perl -le '$_="f"; s/o*$/o/g; print $_' # => fo
perl -le '$_="fo"; s/o*$/o/g; print $_' # => foo
perl -le '$_="foo"; s/o*$/o/g; print $_' # => foo
perl -le '$_="fooo"; s/o*$/o/g; print $_' # => foo

# Python
re.compile(r'o*$').sub("o", "f") # => 'fo'
re.compile(r'o*$').sub("o", "fo") # => 'fo'
re.compile(r'o*$').sub("o", "foo") # => 'fo'
re.compile(r'o*$').sub("o", "fooo") # => 'fo'


For all you Ruby/Perl programmers out there, Python sub is equivalent of gsub or s///g not sub or s///.

Which behaviour makes more sense ? I think Python's much more intuitive. Should we treat it as a bug in Ruby/Perl and fix it, or accept it as an unintended feature ?

Sunday, January 20, 2008

My poor bottom left 6

tree eater by splityarn from flickr (CC-NC-SA)

My tooth hurt a bit in the last week, and it got really bad yesterday, so I called some 24h dental service. It wasn't as 24h as they claimed and I was only able to get an appointment for the next day. The bad news is - root canal treatment was required. The even worse news - it was damn 700 quid for two visits. Here goes the new computer, at least for now. It's seriously insanely expensive compared to dentistry in Poland, whene plain checkup is about 10 GBP, and root canal treatment is about 50 GBP - just Google for prices if you don't believe me. If I actually knew in advance it would cost that much I'd probably just book a flight. Too bad you cannot schedule your dental problems.