Monday, July 16, 2007

Who reads my blog - Redditers and Googlers

Photo by racciari from flickr (CC-NC-SA)

More than a year ago when I started this blog and I had no idea that anybody would actually read it, but it seems to be doing quite well. According to Google Analytics over the year there were over 90 thousand page views by over 50 thousand visitors. Recently there are about 420 page views daily, or one every three and half minutes. I don't think I have that many friends are relatives, so who reads my blog ?

There seem to be two distinct populations - Redditers, and Googlers. Excluding "direct traffic", which simply means that for whatever reason referrer was not recorder, 35% of visitors come from Google, and 32% from Reddit. The next three sources DZone, Daring Fireball and del.icio.us provide only 6.6%, 3.4% and 1.7% of visits, respectively.

The full story of article's readership look something like that:
  • Article is published. I submit it to del.icio.us and usually also to reddit
  • If Redditers like the article it gets to the main page. I have absolutely no idea which articles Redditers will like and which they won't. Actually I less than no idea - things I consider very interesting almost invariably get downvoted, while random rants I wrote when angry or bored get tens of points. So I submit pretty much everything programming-related and let them decide. My karma from doing so is highly positive, so it's probably not considered a very abusive practice
  • In the next day or two it gets a lot of views from Redditers
  • People submit it to other reddit-like websites, or write answers to it, and it stays popular for a few more days
  • There's a sudden drop in popularity, as people move on to other things
  • Google indexes the article, and a steady flow of Google visits starts. The flow is not wide, but it seems to last pretty much indefinitely
To get some numbers I scrapped Google Analytics reports - Google Analytics has no real API, and it became even more difficult to use programmatically after the update, but I somehow managed to extract the information I want (Google Analytics cookie extracted using Firebug).
require 'time'

$cookie = File.read("/home/taw/ga_cookie").chomp

def wget(url, fn)
system 'wget', '--header', $cookie, url, '-O', fn unless File.exists?(fn)
File.read(fn)
end

def each_day(first_day)
day = Time.now.gmtime
day_number = 0
while true
day_s = day.strftime('%Y%m%d')
break if day_s < first_day
yield day_s, day_number
day_number += 1
day -= 24*60*60
end
end

def get_data_for(day)
url = "https://www.google.com/analytics/reporting/export?fmt=3&id=1222880&pdr=#{day}-#{day}&cmp=average&rpt=TopContentReport&trows=500"
fn = "results-#{day}"
res = wget(url, fn)
header_finished = false
res.each{|line|
unless header_finished
header_finished = true if line =~ /\AURL\tPage Views\tUnique Page Views\t/
next
end
url, page_views, unique_page_views, = line.split(/\t/)
next unless page_views # Skip the final line
next unless url =~ %r[\A/\d{4}/\d{2}/]
next if url =~ /\?/
yield(url, page_views.to_i)
}
end

$stats = {}

each_day('20060923') {|date, day_number|
get_data_for(date){|url, page_views|
$stats[url] ||= []
$stats[url][day_number] = page_views
}
}

$stats_by_post_age = []

$stats.each{|url, stats|
stats.reverse.each_with_index{|page_views, age|
page_views ||= 0
$stats_by_post_age[age] ||= 0
$stats_by_post_age[age] += page_views
}
}

total_page_views = $stats_by_post_age.inject{|a,b| a+b}
p $stats_by_post_age.map{|x| 0.01 * (10000 * x.to_f/total_page_views).to_i}

And the not very surprising results:
  • 22.26% of page views are in the day article is published. As the article could have been published on any time of the day (just after midnight to just before midnight), on average that's article's first 12 hours.
  • It falls rapidly to 11.47% and 4.28% over the next two days
  • In the following ten days the numbers are 2.03%, 1.82%, 1.46%, 1.49%, 1.25%, 0.99%, 0.86%, 0.72%, 0.95%, 0.81%. By that time more than half visits occurred.
  • In the following weeks the number gradually decreases, but I think it's more due to many posts not being online long enough than due to actual popularity loss. Maybe I'll run some statistics to test this hypothesis some day.
You should be able to adapt this script to your blog if you want to know how the numbers looks for your blog.

Sunday, July 15, 2007

Short rant on video game usability and 3D acceleration

Photo by poolzelio from flickr (CC-NC)
There's one thing that pretty much every PC game does, and what I really hate. It's using "constant rendering quality" paradigm instead of "constant FPS" paradigm.

PC hardware differs a lot, with some people using older hardware and wanting to play games even if the rendering is only so-so, while other who have just bought shiny new graphics cards demanding really awesome effects from them, more to impress their friends and stimulate graphics card manufacturing than to actually improve gameplay. What pretty much everybody wants is the highest rendering quality that still gives them reasonably FPS rate.

That's what game engines should do - monitor FPS and increase or decrease rendering quality if FPS is not in some predefined range. But not a single game I know does so. Instead they all opt for providing "constant rendering quality" - maintaining some level of rendering quality whether the game gets unusably slow, or has a lot of free GPU cycles. Often both situations happen as player moves from one location to another. Changing graphics setup every few minutes would distract too much from playing, so most old hardware owners either set the quality low enough that they always have good FPS, even if for 90% of the game GPU is half idle, or accept occasional low FPS in exchange for better rendering quality. Or they solve this software problem in hardware and buy a better graphics card.

Oh, and the graphics setup. Instead of having one big "I want that many FPS" slider and then the game filling in details, there are usually dozens of confusing options - some of them affecting rendering speed considerably, others barely at all.

Time to get a new card, the one bought year ago isn't good enough any more.

Saturday, July 14, 2007

Truth, falsehood and voidness in dynamic languages

claws by theogeo from flickr (CC-BY)
One of the things which different dynamic languages do differently is how truth, falsehood, and voidness are handled. I checked how it's done in 9 most popular dynamic languages - Common Lisp, JavaScript, Lua, Perl, PHP, Python, Ruby, Scheme, and Smalltalk.

The first question - does the language has dedicated booleans ? That is - do questions like 2 > 1 return special booleans or something else ?
  • Ruby, Lua, Smalltalk, JavaScript - Yes (true and false)
  • Python - Yes (True and False)
  • Scheme - Yes (#t and #f)
  • Common Lisp - No, it returns symbol t for true and empty list (nil) for false.
  • Perl - No, it return 1 for true, and undef for false.
  • PHP - Kinda. Since PHP4 there are booleans true and false, but their behavior is full of hacks - print true prints 1, print false prints nothing, false == 0, false == NULL, true == 1, even true == 42.
If booleans are used in boolean context their interpretation is obvious. If most objects are used in boolean context they usually are treated the same way as true. There are a few common exceptions. How are empty list, integer 0, floating point 0.0, and empty string treated in boolean context ?
  • Ruby, Scheme, Lua - all are true
  • Perl, PHP, Python - all are false
  • JavaScript - empty list is true, others are false
  • Common Lisp - empty list is false, others are true
  • Smalltalk - NonBooloanReceiver exception is raised if anything but booleans is used in boolean context.
Is string "0" false ?
  • PHP, Perl - unfortunately "0" is false, and this is a huge source of nasty bugs
  • Ruby, Scheme, Lua, JavaScript, Python, Common Lisp - "0" is true
  • Smalltalk - NonBooloanReceiver exception is raised
Is there a special value denoting absence of value ? What accessing nonexistent array element returns ?
  • Ruby, Lua - nil, accessing nonexistent elements returns it
  • JavaScript - undefined, accessing nonexistent elements returns it
  • Perl - undef, accessing nonexistent elements returns it
  • PHP - NULL, accessing nonexistent elements returns it
  • Python - None, accessing nonexistent elements throws an exception
  • Smalltalk - nil, accessing nonexistent elements throws an exception
  • Scheme - there isn't one, accessing nonexistent values is an error
  • Common Lisp - there isn't one, but empty list acts as one in most contexts, it is also returned when accessing nonexistent elements
Is the nonexistent value false in boolean context ?
  • Ruby, Lua, JavaScript, Perl, PHP, Python, Common Lisp - it is false
  • Scheme - there is no nonexistent value marker
  • Smalltalk - NonBooloanReceiver exception is raised
The most common answers are: there are dedicated booleans, and dedicated absence marker; it is possible to use normal objects in boolean context, most of which (including string "0") are treated as true, while absence marker is treated as false.

There is no clear consensus whether 0, 0.0, "", and empty list should be treated as true or false. Personally I think it's better to make them all true. Otherwise either libraries can define other false objects (like decimal 0.00, various empty containers, and so on) what complicates the language, or they cannot what makes it feel inconsistent.

Is most languages accessing nonexistent elements of an array returns an absence marker instead of throwing an exception, and in my opinion that's the right way and it makes the code look much more natural.

Wednesday, July 11, 2007

Using home directory as GTD inbox - version 2

my name is Grisou by *Blunight72* from flickr (CC-NC)

The GTD software I described a few weeks ago evolved quite significantly since then.

Fortunately my inbox is still empty:
$ inbox_size
Your inbox is empty.

It can be used in two modes - either single-shot report of inbox contents with inbox_size, or continuous screening mode plus UI notification with inbox_size_notify. inbox_size.rb is a library (symlinked from /home/taw/local/bin/inbox_size) which finds all items in all my inboxes. It also handles special items:
  • Unread emails in Gmail inbox
  • Uncommitted changes to one of the repositories
  • Music log not committed to last.fm
  • Passwords file chanced since last encrypted copy
  • Last backup older than 3 days
  • Any things I wanted to be informed about

The code


The main code is in inbox_size.rb:
require 'time'
require 'magic_xml'

$offline = false

def inbox_ls
items_whitelist = %w[
/home/taw/Desktop
/home/taw/ebooks
/home/taw/everything
/home/taw/img
/home/taw/ipoddb
/home/taw/local
/home/taw/movies
/home/taw/music
/home/taw/ref
/home/taw/website
/home/taw/website_snapshot
]

files = (Dir["/home/taw/*"] +
Dir["/home/taw/Desktop/*"] +
Dir["/home/taw/movies/complete/*"] -
items_whitelist)
items = files.map{|x|x.sub(%r[\A/home/taw/],"")}

# Code for handling special inbox items goes here
# ...

return items.sort.map{|item| "* #{item}"}
end

if $0 == __FILE__
if ARGV[0] == '--offline'
ARGV.shift
$offline = true
end
items = inbox_ls
if items.empty?
puts "Your inbox is empty."
else
puts "#{items.size} items in your inbox:", *items
end
end


inbox_size_notify which scans the inbox continuouly and displays UI notifications if it's not empty is:
require 'inbox_size'

max_displayed = 30

big_timer = 5
old_items = []

while true
items = inbox_ls
next if items == []

if items == old_items
big_timer -= 1
sleep 60
next unless big_timer == 0
end
big_timer = 5

if items.size > max_displayed
displayed_items = items.sort_by{rand}[0, max_displayed].sort + ["* ..."]
else
displayed_items = items
end
system "notify", "Inbox is not processed", "#{items.size} items in your inbox:", *displayed_items

sleep 60
old_items = items
end


Script which displays KDE notifications is:
header = "Notification"
msg = ARGV.join("\n") # "All your base\nAre belong to us"

system 'dcop', 'knotify', 'Notify', 'notify', 'notify', header, msg, 'nosound', 'nofile', '16', '0'

Backup reminder


Since my disk died I became more serious about backups. I indent to have at least regular rsync of my SVK repository and some important files. Here's a script which rsyncs these files from shanti (my main box) to ishida (an old laptop).

t0 = Time.now

rv = system 'rsync -rL ~/.mirrorme/ taw@ishida:/home/taw/shanti_mirror/'

unless rv
STDERR.puts "Error trying to rsync"
exit 1
end

t1 = Time.now

File.open('/home/taw/.last_backup', 'w') {|fh|
fh.puts t1
}

puts "Started: #{t0}"
puts "Started: #{t1}"
puts "Time: #{t1-t0}s"


If backup was successful a time stamp is saved to /home/taw/.last_backup. inbox_size.rb reminds me if I didn't backup for more than 3 days:

  # Time since last rsync
time_since_last_rsync = Time.now - Time.parse(File.read("/home/taw/.last_backup").chomp)
if time_since_last_rsync > 3 * 24 * 60 * 60
items << "Over 3 days since the last backup"
end

Tickler file


The "tickler file" (/home/taw/.tickler) contains all things I want to be reminded about. Appointments, deadlines, new episodes of The Colbert Report, whatever. Of course usually I want to be reminded before the deadline, not on the deadline, so the date must be some time before the event of interest. Entries in the tickler file look something like that:
Sat Jul 21 05:49:14 +0200 2007
15 days to Wikimedia Foundation validation deadline


It can be edited as a text file, but it's more convenient to add new entries with add_tickler script:
$ add_tickler 24h "New TCR episode will be available"


unless ARGV.size == 2
STDERR.puts "Usage: #{$0} 'due' 'msg'"
exit 1
end

due = ARGV.shift
msg = ARGV.shift

due_sec = case due
when /\A(\d+)s\Z/
$1.to_i
when /\A(\d+)m\Z/
$1.to_i * 60
when /\A(\d+)h\Z/
$1.to_i * 60 * 60
when /\A(\d+)d\Z/
$1.to_i * 60 * 60 * 24
else
STDERR.puts <<EOF
Usage: #{$0} 'due' 'msg'
Due can be:
* 15s
* 15m
* 15h
* 15d
EOF
exit 1
end

due_time = Time.now + due_sec

File.open("/home/taw/.tickler", "a") {|fh|
fh.puts due_time
fh.puts msg
}


The tickler file is checked by the following code in inbox_size.rb:
  # Tickler items
tickler = File.readlines("/home/taw/.tickler")
while not tickler.empty?
deadline = Time.parse(tickler.shift.chomp)
msg = tickler.shift
if Time.now > deadline
items << msg
end
end

The passwords file


Pretty much every website requires an account nowadays. I don't want to reuse password on multiple website, so I generate them randomly (cat /dev/urandom | perl -ple 's/[^a-zA-Z0-9]//g' | head) and keep them in unencrypted file /home/taw/.passwords which I simply grep if I want to login to some weird website again (normally Firefox remembers these passwords anyway, but sometimes it's necessary).

As it would suck to lose all accounts, I AES-256-CBC encrypt this file and keep encrypted copies in /home/taw/ref/skrt/, which is mirrored to multiple servers. As I need to enter my password to encrypt the file, it cannot be done automatically. The most inbox_size.rb can do is reminding me if there's no up-to-date skrt file:
  # skrt up to date ?
pwtm = File.mtime("/home/taw/.passwords")
last_skrt_tm = Dir["/home/taw/ref/skrt/*"].map{|fn| File.mtime(fn)}.max
if pwtm > last_skrt_tm
items << "No up-to-date skrt available"
end


In which case I run the following skrt_new script:
t = Time.now
fn = sprintf "skrt-%04d-%02d-%02d", t.year, t.month, t.day
system "openssl aes-256-cbc /home/taw/ref/skrt/#{fn}

Music log


The iPod-last.fm bridge consists of two parts - one which extracts the log from an iPod, and one which submits the data to last.fm. They communicate using very simple format, with lines like that (time is local):
Sumptuastic ; Cisza (Radio Edit) ; Cisza (Single) ; 185 ; 2007-07-11 17:51:27


Nothing in the format is iPod-specific, so I wrote a wrapper around mplayer which logs music it plays to /home/taw/.music_log. It can also randomize songs and search for them recursively in directories. It uses a few extra programs - id3v2 to get song title, artist and album (from either ID3v2 or ID3v1 tags), and mp3info to get playing time.
def mp3_get_metadata(file_name)
song_info = `id3v2 -l "#{file_name}"`
artist = nil
title = nil
album = nil

if song_info =~ /^TPE1 \(Lead performer\(s\)\/Soloist\(s\)\): (.*)$/
artist = $1
elsif song_info =~ /^Title : .{31} Artist: (.*?)\s*$/
artist = $1
end

if song_info =~ /^TIT2 \(Title\/songname\/content description\): (.*)$/
title = $1
elsif song_info =~ /^Title : (.{0,31}?)\s+ Artist: .*$/
title = $1
end

if song_info =~ /^TALB \(Album\/Movie\/Show title\): (.*)$/
album = $1
elsif song_info =~ /^Album : (.{0,31}?)\s+ Year:/
album = $1
end

return [artist, title, album]
end

def mp3_get_length(file_name)
`mp3info -F -p "%S" "#{file_name}"`.to_i
end

def with_timer
time_start = Time.now
yield
return [time_start, Time.now - time_start]
end

randomize = true
if ARGV[0] == "-s" # --sequential
randomize = false
ARGV.shift
end

songs = ARGV.map{|fn| if File.directory?(fn) then Dir["#{fn}/**/*.mp3"] else fn end}.flatten
songs = songs.sort_by{rand} if randomize

songs.each{|song|
time_start, time_elapsed = with_timer do
rv = system "mplayer", song
exit unless rv
end
artist, title, album = *mp3_get_metadata(song)
length = mp3_get_length(song)

next unless length >= 90 and (time_elapsed >= 240 or time_elapsed >= 0.5 * length)

date = time_start.strftime("%Y-%m-%d %H:%M:%S")

File.open("/home/taw/.music_log", "a") {|fh|
fh.puts "#{artist} ; #{title} ; #{album} ; #{length} ; #{date}"
}
}


It's a good idea to commit the log to last.fm often, but I'm not doing it automatically yet, as network problems with last.fm are too frequent. Instead inbox_size.rb reminds me if there are old uncommitted entries in the log:
  # .music_log not empty and older than one hour
if File.size("/home/taw/.music_log") > 0 and File.mtime("/home/taw/.music_log") < Time.now - 60*60
items << "Music log not clean"
end

Uncommitted stuff in repositories


I sometimes get distracted by some interruption and forget to commit things to repositories.
I wrote uncommitted_changes script which checks local checkouts of all repositories I use (currently 1 SVK and 2 SVN repositories) if there are any uncommitted changes. I use svn/svk diff instead of svn/svk status as the latter finds all kinds of temporary files, and I always svn/svk add all new files when I start coding anyway.

Dir.chdir("/home/taw/everything/") { system "svk diff" }
Dir.chdir("/home/taw/everything/rf-rlisp/") { system "svn diff" }
Dir.chdir("/home/taw/everything/gna_tawbot/") { system "svn diff" }


inbox_size.rb simply checks that output of this script is empty:
  # Uncommitted changes
uc = `uncommitted_changes`
unless uc == ""
items << "There are uncommitted changes in the repository"
end

Unread Gmail emails


The last kind of inbox items tracked by inbox_size.rb are email inbox items. Google APIs are almost invariably ugly Java-centric blobs of suckiness, so instead of using Gmail API I simply get the list from RSS, parsed using magic/xml.
  # Unread Gmail messages
unless $offline
gmail_passwd = File.read("/home/taw/.gmail_passwd").chomp
url = "https://Tomasz.Wegrzanowski:#{gmail_passwd}@mail.google.com/mail/feed/atom"
XML.load(url).children(:entry, :title).each{|title|
items << "Email: #{title.text}"
}
end