Monday, July 16, 2007

Who reads my blog - Redditers and Googlers

Photo by racciari from flickr (CC-NC-SA)

More than a year ago when I started this blog and I had no idea that anybody would actually read it, but it seems to be doing quite well. According to Google Analytics over the year there were over 90 thousand page views by over 50 thousand visitors. Recently there are about 420 page views daily, or one every three and half minutes. I don't think I have that many friends are relatives, so who reads my blog ?

There seem to be two distinct populations - Redditers, and Googlers. Excluding "direct traffic", which simply means that for whatever reason referrer was not recorder, 35% of visitors come from Google, and 32% from Reddit. The next three sources DZone, Daring Fireball and del.icio.us provide only 6.6%, 3.4% and 1.7% of visits, respectively.

The full story of article's readership look something like that:
  • Article is published. I submit it to del.icio.us and usually also to reddit
  • If Redditers like the article it gets to the main page. I have absolutely no idea which articles Redditers will like and which they won't. Actually I less than no idea - things I consider very interesting almost invariably get downvoted, while random rants I wrote when angry or bored get tens of points. So I submit pretty much everything programming-related and let them decide. My karma from doing so is highly positive, so it's probably not considered a very abusive practice
  • In the next day or two it gets a lot of views from Redditers
  • People submit it to other reddit-like websites, or write answers to it, and it stays popular for a few more days
  • There's a sudden drop in popularity, as people move on to other things
  • Google indexes the article, and a steady flow of Google visits starts. The flow is not wide, but it seems to last pretty much indefinitely
To get some numbers I scrapped Google Analytics reports - Google Analytics has no real API, and it became even more difficult to use programmatically after the update, but I somehow managed to extract the information I want (Google Analytics cookie extracted using Firebug).
require 'time'

$cookie = File.read("/home/taw/ga_cookie").chomp

def wget(url, fn)
system 'wget', '--header', $cookie, url, '-O', fn unless File.exists?(fn)
File.read(fn)
end

def each_day(first_day)
day = Time.now.gmtime
day_number = 0
while true
day_s = day.strftime('%Y%m%d')
break if day_s < first_day
yield day_s, day_number
day_number += 1
day -= 24*60*60
end
end

def get_data_for(day)
url = "https://www.google.com/analytics/reporting/export?fmt=3&id=1222880&pdr=#{day}-#{day}&cmp=average&rpt=TopContentReport&trows=500"
fn = "results-#{day}"
res = wget(url, fn)
header_finished = false
res.each{|line|
unless header_finished
header_finished = true if line =~ /\AURL\tPage Views\tUnique Page Views\t/
next
end
url, page_views, unique_page_views, = line.split(/\t/)
next unless page_views # Skip the final line
next unless url =~ %r[\A/\d{4}/\d{2}/]
next if url =~ /\?/
yield(url, page_views.to_i)
}
end

$stats = {}

each_day('20060923') {|date, day_number|
get_data_for(date){|url, page_views|
$stats[url] ||= []
$stats[url][day_number] = page_views
}
}

$stats_by_post_age = []

$stats.each{|url, stats|
stats.reverse.each_with_index{|page_views, age|
page_views ||= 0
$stats_by_post_age[age] ||= 0
$stats_by_post_age[age] += page_views
}
}

total_page_views = $stats_by_post_age.inject{|a,b| a+b}
p $stats_by_post_age.map{|x| 0.01 * (10000 * x.to_f/total_page_views).to_i}

And the not very surprising results:
  • 22.26% of page views are in the day article is published. As the article could have been published on any time of the day (just after midnight to just before midnight), on average that's article's first 12 hours.
  • It falls rapidly to 11.47% and 4.28% over the next two days
  • In the following ten days the numbers are 2.03%, 1.82%, 1.46%, 1.49%, 1.25%, 0.99%, 0.86%, 0.72%, 0.95%, 0.81%. By that time more than half visits occurred.
  • In the following weeks the number gradually decreases, but I think it's more due to many posts not being online long enough than due to actual popularity loss. Maybe I'll run some statistics to test this hypothesis some day.
You should be able to adapt this script to your blog if you want to know how the numbers looks for your blog.

5 comments:

  1. I've just started a blog and I was wondering myself if people considered it abusive to submit my own articles. I don't do it for selfish reasons, I would just like to contribute to the community.

    Reddit has a pretty good rating system, so if it's the article isn't up to par it'll be gone within a couple minutes anyway.

    ReplyDelete
  2. @james:

    On Reddit at least, people don't usually care as long as the content is good. If the content is bad (technically inaccurate, vacuous, or overly arrogant), you'll sometimes get a more negative response than if someone else submitted it, but usually bad posts don't get any response at all.

    ReplyDelete
  3. Anonymous17:20

    I've just got you on RSS

    ReplyDelete
  4. Anonymous14:33

    It's the cats :-)

    ReplyDelete
  5. Wow, that is seriously impressive readership. I sometimes feel like nobody reads my blog even though I put my heart into it. Thanks for the inspiration. I just wrote a post that tries to ask the very same question about "who reads my blog" from a different angle, namely not yet having a big readership. http://embodieddreams.com/2011/10/15/who-reads-my-blog/

    ReplyDelete