taw's blog: July 2007

Monday, July 16, 2007

Who reads my blog - Redditers and Googlers

More than a year ago when I started this blog and I had no idea that anybody would actually read it, but it seems to be doing quite well. According to Google Analytics over the year there were over 90 thousand page views by over 50 thousand visitors. Recently there are about 420 page views daily, or one every three and half minutes. I don't think I have that many friends are relatives, so who reads my blog ?

There seem to be two distinct populations - Redditers, and Googlers. Excluding "direct traffic", which simply means that for whatever reason referrer was not recorder, 35% of visitors come from Google, and 32% from Reddit. The next three sources DZone, Daring Fireball and del.icio.us provide only 6.6%, 3.4% and 1.7% of visits, respectively.

The full story of article's readership look something like that:

Article is published. I submit it to del.icio.us and usually also to reddit
If Redditers like the article it gets to the main page. I have absolutely no idea which articles Redditers will like and which they won't. Actually I less than no idea - things I consider very interesting almost invariably get downvoted, while random rants I wrote when angry or bored get tens of points. So I submit pretty much everything programming-related and let them decide. My karma from doing so is highly positive, so it's probably not considered a very abusive practice
In the next day or two it gets a lot of views from Redditers
People submit it to other reddit-like websites, or write answers to it, and it stays popular for a few more days
There's a sudden drop in popularity, as people move on to other things
Google indexes the article, and a steady flow of Google visits starts. The flow is not wide, but it seems to last pretty much indefinitely

To get some numbers I scrapped Google Analytics reports - Google Analytics has no real API, and it became even more difficult to use programmatically after the update, but I somehow managed to extract the information I want (Google Analytics cookie extracted using Firebug).

require 'time'

$cookie = File.read("/home/taw/ga_cookie").chomp

def wget(url, fn)
  system 'wget', '--header', $cookie, url, '-O', fn unless File.exists?(fn)
  File.read(fn)
end

def each_day(first_day)
  day = Time.now.gmtime
  day_number = 0
  while true
    day_s = day.strftime('%Y%m%d')
    break if day_s < first_day
    yield day_s, day_number
    day_number += 1
    day -= 24*60*60
  end
end

def get_data_for(day)
  url = "https://www.google.com/analytics/reporting/export?fmt=3&id=1222880&pdr=#{day}-#{day}&cmp=average&rpt=TopContentReport&trows=500"
  fn = "results-#{day}"
  res = wget(url, fn)
  header_finished = false
  res.each{|line|
    unless header_finished
      header_finished = true if line =~ /\AURL\tPage Views\tUnique Page Views\t/
      next
    end
    url, page_views, unique_page_views, = line.split(/\t/)
    next unless page_views # Skip the final line
    next unless url =~ %r[\A/\d{4}/\d{2}/]
    next if url =~ /\?/
    yield(url, page_views.to_i)
  }
end

$stats = {}

each_day('20060923') {|date, day_number|
  get_data_for(date){|url, page_views|
    $stats[url] ||= []
    $stats[url][day_number] = page_views
  }
}

$stats_by_post_age = []

$stats.each{|url, stats|
  stats.reverse.each_with_index{|page_views, age|
    page_views ||= 0
    $stats_by_post_age[age] ||= 0
    $stats_by_post_age[age] += page_views
  }
}

total_page_views = $stats_by_post_age.inject{|a,b| a+b}
p $stats_by_post_age.map{|x| 0.01 * (10000 * x.to_f/total_page_views).to_i}

And the not very surprising results:

22.26% of page views are in the day article is published. As the article could have been published on any time of the day (just after midnight to just before midnight), on average that's article's first 12 hours.
It falls rapidly to 11.47% and 4.28% over the next two days
In the following ten days the numbers are 2.03%, 1.82%, 1.46%, 1.49%, 1.25%, 0.99%, 0.86%, 0.72%, 0.95%, 0.81%. By that time more than half visits occurred.
In the following weeks the number gradually decreases, but I think it's more due to many posts not being online long enough than due to actual popularity loss. Maybe I'll run some statistics to test this hypothesis some day.

You should be able to adapt this script to your blog if you want to know how the numbers looks for your blog.

Sunday, July 15, 2007

Short rant on video game usability and 3D acceleration

There's one thing that pretty much every PC game does, and what I really hate. It's using "constant rendering quality" paradigm instead of "constant FPS" paradigm.

PC hardware differs a lot, with some people using older hardware and wanting to play games even if the rendering is only so-so, while other who have just bought shiny new graphics cards demanding really awesome effects from them, more to impress their friends and stimulate graphics card manufacturing than to actually improve gameplay. What pretty much everybody wants is the highest rendering quality that still gives them reasonably FPS rate.

That's what game engines should do - monitor FPS and increase or decrease rendering quality if FPS is not in some predefined range. But not a single game I know does so. Instead they all opt for providing "constant rendering quality" - maintaining some level of rendering quality whether the game gets unusably slow, or has a lot of free GPU cycles. Often both situations happen as player moves from one location to another. Changing graphics setup every few minutes would distract too much from playing, so most old hardware owners either set the quality low enough that they always have good FPS, even if for 90% of the game GPU is half idle, or accept occasional low FPS in exchange for better rendering quality. Or they solve this software problem in hardware and buy a better graphics card.

Oh, and the graphics setup. Instead of having one big "I want that many FPS" slider and then the game filling in details, there are usually dozens of confusing options - some of them affecting rendering speed considerably, others barely at all.

Time to get a new card, the one bought year ago isn't good enough any more.

Saturday, July 14, 2007

Truth, falsehood and voidness in dynamic languages

One of the things which different dynamic languages do differently is how truth, falsehood, and voidness are handled. I checked how it's done in 9 most popular dynamic languages - Common Lisp, JavaScript, Lua, Perl, PHP, Python, Ruby, Scheme, and Smalltalk.

The first question - does the language has dedicated booleans ? That is - do questions like 2 > 1 return special booleans or something else ?

Ruby, Lua, Smalltalk, JavaScript - Yes (true and false)
Python - Yes (True and False)
Scheme - Yes (#t and #f)
Common Lisp - No, it returns symbol t for true and empty list (nil) for false.
Perl - No, it return 1 for true, and undef for false.
PHP - Kinda. Since PHP4 there are booleans true and false, but their behavior is full of hacks - print true prints 1, print false prints nothing, false == 0, false == NULL, true == 1, even true == 42.

If booleans are used in boolean context their interpretation is obvious. If most objects are used in boolean context they usually are treated the same way as true. There are a few common exceptions. How are empty list, integer 0, floating point 0.0, and empty string treated in boolean context ?

Ruby, Scheme, Lua - all are true
Perl, PHP, Python - all are false
JavaScript - empty list is true, others are false
Common Lisp - empty list is false, others are true
Smalltalk - NonBooloanReceiver exception is raised if anything but booleans is used in boolean context.

Is string "0" false ?

PHP, Perl - unfortunately "0" is false, and this is a huge source of nasty bugs
Ruby, Scheme, Lua, JavaScript, Python, Common Lisp - "0" is true
Smalltalk - NonBooloanReceiver exception is raised

Is there a special value denoting absence of value ? What accessing nonexistent array element returns ?

Ruby, Lua - nil, accessing nonexistent elements returns it
JavaScript - undefined, accessing nonexistent elements returns it
Perl - undef, accessing nonexistent elements returns it
PHP - NULL, accessing nonexistent elements returns it
Python - None, accessing nonexistent elements throws an exception
Smalltalk - nil, accessing nonexistent elements throws an exception
Scheme - there isn't one, accessing nonexistent values is an error
Common Lisp - there isn't one, but empty list acts as one in most contexts, it is also returned when accessing nonexistent elements

Is the nonexistent value false in boolean context ?

Ruby, Lua, JavaScript, Perl, PHP, Python, Common Lisp - it is false
Scheme - there is no nonexistent value marker
Smalltalk - NonBooloanReceiver exception is raised

The most common answers are: there are dedicated booleans, and dedicated absence marker; it is possible to use normal objects in boolean context, most of which (including string "0") are treated as true, while absence marker is treated as false.

There is no clear consensus whether 0, 0.0, "", and empty list should be treated as true or false. Personally I think it's better to make them all true. Otherwise either libraries can define other false objects (like decimal 0.00, various empty containers, and so on) what complicates the language, or they cannot what makes it feel inconsistent.

Is most languages accessing nonexistent elements of an array returns an absence marker instead of throwing an exception, and in my opinion that's the right way and it makes the code look much more natural.

Wednesday, July 11, 2007

Using home directory as GTD inbox - version 2

The GTD software I described a few weeks ago evolved quite significantly since then.

Fortunately my inbox is still empty:

$ inbox_size
Your inbox is empty.

It can be used in two modes - either single-shot report of inbox contents with inbox_size, or continuous screening mode plus UI notification with inbox_size_notify. inbox_size.rb is a library (symlinked from /home/taw/local/bin/inbox_size) which finds all items in all my inboxes. It also handles special items:

Unread emails in Gmail inbox
Uncommitted changes to one of the repositories
Music log not committed to last.fm
Passwords file chanced since last encrypted copy
Last backup older than 3 days
Any things I wanted to be informed about

The code

The main code is in inbox_size.rb:

require 'time'
require 'magic_xml'

$offline = false

def inbox_ls
  items_whitelist = %w[
    /home/taw/Desktop
    /home/taw/ebooks
    /home/taw/everything
    /home/taw/img
    /home/taw/ipoddb
    /home/taw/local
    /home/taw/movies
    /home/taw/music
    /home/taw/ref
    /home/taw/website
    /home/taw/website_snapshot
  ]

  files = (Dir["/home/taw/*"] +
           Dir["/home/taw/Desktop/*"] +
           Dir["/home/taw/movies/complete/*"] -
           items_whitelist)
  items = files.map{|x|x.sub(%r[\A/home/taw/],"")}

  # Code for handling special inbox items goes here
  # ...

  return items.sort.map{|item| "* #{item}"}
end

if $0 == __FILE__
  if ARGV[0] == '--offline'
    ARGV.shift
    $offline = true
  end
  items = inbox_ls
  if items.empty?
    puts "Your inbox is empty."
  else
    puts "#{items.size} items in your inbox:", *items
  end
end

inbox_size_notify which scans the inbox continuouly and displays UI notifications if it's not empty is:

require 'inbox_size'

max_displayed = 30

big_timer = 5
old_items = []

while true
  items = inbox_ls
  next if items == []

  if items == old_items
    big_timer -= 1
    sleep 60
    next unless big_timer == 0
  end
  big_timer = 5

  if items.size > max_displayed
    displayed_items = items.sort_by{rand}[0, max_displayed].sort + ["* ..."]
  else
    displayed_items = items
  end
  system "notify", "Inbox is not processed", "#{items.size} items in your inbox:", *displayed_items

  sleep 60
  old_items = items
end

Script which displays KDE notifications is:

header = "Notification"
msg = ARGV.join("\n") # "All your base\nAre belong to us"

system 'dcop', 'knotify', 'Notify', 'notify', 'notify', header, msg, 'nosound', 'nofile', '16', '0'

Backup reminder

Since my disk died I became more serious about backups. I indent to have at least regular rsync of my SVK repository and some important files. Here's a script which rsyncs these files from shanti (my main box) to ishida (an old laptop).

t0 = Time.now

rv = system 'rsync -rL ~/.mirrorme/ taw@ishida:/home/taw/shanti_mirror/'

unless rv
  STDERR.puts "Error trying to rsync"
  exit 1
end

t1 = Time.now

File.open('/home/taw/.last_backup', 'w') {|fh|
  fh.puts t1
}

puts "Started: #{t0}"
puts "Started: #{t1}"
puts "Time: #{t1-t0}s"

If backup was successful a time stamp is saved to /home/taw/.last_backup. inbox_size.rb reminds me if I didn't backup for more than 3 days:

  # Time since last rsync
  time_since_last_rsync = Time.now - Time.parse(File.read("/home/taw/.last_backup").chomp)
  if time_since_last_rsync > 3 * 24 * 60 * 60
    items << "Over 3 days since the last backup"
  end

Tickler file

The "tickler file" (/home/taw/.tickler) contains all things I want to be reminded about. Appointments, deadlines, new episodes of The Colbert Report, whatever. Of course usually I want to be reminded before the deadline, not on the deadline, so the date must be some time before the event of interest. Entries in the tickler file look something like that:

Sat Jul 21 05:49:14 +0200 2007
15 days to Wikimedia Foundation validation deadline

It can be edited as a text file, but it's more convenient to add new entries with add_tickler script:

$ add_tickler 24h "New TCR episode will be available"

unless ARGV.size == 2
  STDERR.puts "Usage: #{$0} 'due' 'msg'"
  exit 1
end

due = ARGV.shift
msg = ARGV.shift

due_sec = case due
when /\A(\d+)s\Z/
  $1.to_i
when /\A(\d+)m\Z/
  $1.to_i * 60
when /\A(\d+)h\Z/
  $1.to_i * 60 * 60
when /\A(\d+)d\Z/
  $1.to_i * 60 * 60 * 24
else
  STDERR.puts <<EOF
Usage: #{$0} 'due' 'msg'
Due can be:
* 15s
* 15m
* 15h
* 15d
EOF
  exit 1
end

due_time = Time.now + due_sec

File.open("/home/taw/.tickler", "a") {|fh|
  fh.puts due_time
  fh.puts msg
}

The tickler file is checked by the following code in inbox_size.rb:

  # Tickler items
  tickler = File.readlines("/home/taw/.tickler")
  while not tickler.empty?
    deadline = Time.parse(tickler.shift.chomp)
    msg = tickler.shift
    if Time.now > deadline
      items << msg
    end
  end

The passwords file

Pretty much every website requires an account nowadays. I don't want to reuse password on multiple website, so I generate them randomly (cat /dev/urandom | perl -ple 's/[^a-zA-Z0-9]//g' | head) and keep them in unencrypted file /home/taw/.passwords which I simply grep if I want to login to some weird website again (normally Firefox remembers these passwords anyway, but sometimes it's necessary).

As it would suck to lose all accounts, I AES-256-CBC encrypt this file and keep encrypted copies in /home/taw/ref/skrt/, which is mirrored to multiple servers. As I need to enter my password to encrypt the file, it cannot be done automatically. The most inbox_size.rb can do is reminding me if there's no up-to-date skrt file:

  # skrt up to date ?
  pwtm = File.mtime("/home/taw/.passwords")
  last_skrt_tm = Dir["/home/taw/ref/skrt/*"].map{|fn| File.mtime(fn)}.max
  if pwtm > last_skrt_tm
    items << "No up-to-date skrt available"
  end

In which case I run the following skrt_new script:

t = Time.now
fn = sprintf "skrt-%04d-%02d-%02d", t.year, t.month, t.day
system "openssl aes-256-cbc /home/taw/ref/skrt/#{fn}

Music log

The iPod-last.fm bridge consists of two parts - one which extracts the log from an iPod, and one which submits the data to last.fm. They communicate using very simple format, with lines like that (time is local):

Sumptuastic ; Cisza (Radio Edit) ; Cisza (Single) ; 185 ; 2007-07-11 17:51:27

Nothing in the format is iPod-specific, so I wrote a wrapper around mplayer which logs music it plays to /home/taw/.music_log. It can also randomize songs and search for them recursively in directories. It uses a few extra programs - id3v2 to get song title, artist and album (from either ID3v2 or ID3v1 tags), and mp3info to get playing time.

def mp3_get_metadata(file_name)
  song_info = `id3v2 -l "#{file_name}"`
  artist    = nil
  title     = nil
  album     = nil

  if song_info =~ /^TPE1 \(Lead performer\(s\)\/Soloist\(s\)\): (.*)$/
    artist = $1
  elsif song_info =~ /^Title  : .{31} Artist: (.*?)\s*$/
    artist = $1
  end

  if song_info =~ /^TIT2 \(Title\/songname\/content description\): (.*)$/
    title = $1
  elsif song_info =~ /^Title  : (.{0,31}?)\s+ Artist: .*$/
    title = $1
  end

  if song_info =~ /^TALB \(Album\/Movie\/Show title\): (.*)$/
    album = $1
  elsif song_info =~ /^Album  : (.{0,31}?)\s+ Year:/
    album = $1
  end

  return [artist, title, album]
end

def mp3_get_length(file_name)
  `mp3info -F -p "%S" "#{file_name}"`.to_i
end

def with_timer
  time_start = Time.now
  yield
  return [time_start, Time.now - time_start]
end

randomize = true
if ARGV[0] == "-s" # --sequential
  randomize = false
  ARGV.shift
end

songs = ARGV.map{|fn| if File.directory?(fn) then Dir["#{fn}/**/*.mp3"] else fn end}.flatten
songs = songs.sort_by{rand} if randomize

songs.each{|song|
  time_start, time_elapsed = with_timer do
    rv = system "mplayer", song
    exit unless rv
  end
  artist, title, album = *mp3_get_metadata(song)
  length = mp3_get_length(song)

  next unless length >= 90 and (time_elapsed >= 240 or time_elapsed >= 0.5 * length)

  date = time_start.strftime("%Y-%m-%d %H:%M:%S")

  File.open("/home/taw/.music_log", "a") {|fh|
    fh.puts "#{artist} ; #{title} ; #{album} ; #{length} ; #{date}"
  }
}

It's a good idea to commit the log to last.fm often, but I'm not doing it automatically yet, as network problems with last.fm are too frequent. Instead inbox_size.rb reminds me if there are old uncommitted entries in the log:

  # .music_log not empty and older than one hour
  if File.size("/home/taw/.music_log") > 0 and File.mtime("/home/taw/.music_log") < Time.now - 60*60
    items << "Music log not clean"
  end

Uncommitted stuff in repositories

I sometimes get distracted by some interruption and forget to commit things to repositories.
I wrote uncommitted_changes script which checks local checkouts of all repositories I use (currently 1 SVK and 2 SVN repositories) if there are any uncommitted changes. I use svn/svk diff instead of svn/svk status as the latter finds all kinds of temporary files, and I always svn/svk add all new files when I start coding anyway.

Dir.chdir("/home/taw/everything/") { system "svk diff" }
Dir.chdir("/home/taw/everything/rf-rlisp/") { system "svn diff" }
Dir.chdir("/home/taw/everything/gna_tawbot/") { system "svn diff" }

inbox_size.rb simply checks that output of this script is empty:

  # Uncommitted changes
  uc = `uncommitted_changes`
  unless uc == ""
    items << "There are uncommitted changes in the repository"
  end

Unread Gmail emails

The last kind of inbox items tracked by inbox_size.rb are email inbox items. Google APIs are almost invariably ugly Java-centric blobs of suckiness, so instead of using Gmail API I simply get the list from RSS, parsed using magic/xml.

  # Unread Gmail messages
  unless $offline
    gmail_passwd = File.read("/home/taw/.gmail_passwd").chomp
    url = "https://Tomasz.Wegrzanowski:#{gmail_passwd}@mail.google.com/mail/feed/atom"
    XML.load(url).children(:entry, :title).each{|title|
      items << "Email: #{title.text}"
    }
  end