taw's blog: Predicting coolness

One of the coolest thing about the Internet is that it tries to predict what you're going to like. Nothing has tried that before. With newspapers or TV they tried to make you like certain things, usually products of those that paid for advertisements or the political views shared by the redactors. Don't even mention the "ratings" one could find in the newspapers, they were completely laughable.

But the Internet can be different.

First order prediction

What's the simplest way to predict whether you're going to like something or not ? Well, one could ask a lot of people what do they think about it, and simply guess that your opinion is going to fall somewhere in the middle.

That's what IMDB is doing. It is very good at suggesting what kind of movies are you going to like. Of course only if you have taste similar to the IMDB average. If not, you need to add a correction factor, like - well, I don't really like Spielberg, so I'm going to subtract 0.8 from each of his movies' score.

What if the system could do such corrections for you ?

Second order prediction

The next style of prediction, and the one that is considered the way nowadays, is predicting based on similarity of preferences. So you have to rate a few things first, then you get clustered with users that have similar preferences to yours, and the system estimates your further preferences based on theirs. Unless you're really unusual it's going to be better than the first order prediction. Or unless you provided too little data, but a well designed system should just fall back to the first order prediction in such case.

A very primitive version of second order prediction is something that the best p2p program ever (AudioGalaxy) had - the "if you like songs by A, you should also check B, C, and D". It worked perfectly. Most Internet bookstores seem to have a similar feature too, but it doesn't work that well. I guess music taste is simpler to predict than book taste.

A full-fledged second order prediction is present in StumbleUpon and last.fm.

StumbleUpon tries to find you some cool websites by clustering you with other users. It's a browser plugin (I use it with FireFox), so if you want a random website you click Stumble!, and then you can click a "I like it" or "I don't like it" button to get clustered better. Before you get any pages you have to select topics you're interested in, so it has something to start from. I think that's the best thing to happen to the web browsing since FireFox.

last.fm (and I think there are a few other projects like it) tries to predict your music preferences by clustering you with other users. It's integrated with a music player (I use it with AmaroK). Unfortunately it's not integrated with a P2P program, so it can't simply stream you new music and ask whether you like it, but it has some sort of personalized radio, a suggest feature, "neighbourhood" (people with similar taste) and other goodies.

There are probably many more than that. I predict you're going to like them. The prediction used first order method, because almost everybody does like them.

Third order prediction

Is clustering the last word or are we heading towards a new kind of coolness prediction ? Will IMDB add user clustering ? Will "smarter" methods like Bayesian networks find their use ? Will we be using mobile phones in grocery stores to cluster our food preferences ? How will it all affect anonymity and privacy ? In any case, I predict it's going to be the next big thing.

taw's blog

Saturday, May 20, 2006

Predicting coolness

First order prediction

Second order prediction

Third order prediction

2 comments: