taw's blog: Using del.icio.us to deal with lack of tags on Blogger

Wednesday, July 26, 2006

Using del.icio.us to deal with lack of tags on Blogger

I don't have much experience with blogging websites, but Blogger definitely lacks many useful features, like tags. Nothing is lost, however, we can easily work around it by using del.icio.us. First let's download the archives:

$ wget t-a-w.blogspot.com/2006_0{5,6,7}_01_t-a-w_archive.html

Now let's extract the posts. Because they're in reverse chronological order, we want to explicitly specify order of files if we don't want to have a total mess. I use simple regular expression, but the HTML is very nicely tagged, so you can also use a "real" HTML-aware parser:

$ cat 2006_0{7,6,5}* | ruby -e 'STDIN .read .scan( %r[<h3 class="post-title">\s*(.*?)\s*</h3>.*?<a href="(\S+)" title="permanent link">]m) .each{|title, url| print "#{title}\n#{url}\n\n\n"}' >POSTS

Now the POSTS file looks something like this:

$ head POSTS
Free beer for the first 10 cool RLisp programs
http://t-a-w.blogspot.com/2006/07/free-beer-for-first-10-cool-rlisp.html


List of things that suck in Scheme
http://t-a-w.blogspot.com/2006/07/list-of-things-that-suck-in-scheme.html


RLisp gets HTTP support
http://t-a-w.blogspot.com/2006/07/rlisp-gets-http-support.html

We should edit it, by inserting the relevant tags below the URLs:

$ head POSTS
Free beer for the first 10 cool RLisp programs
http://t-a-w.blogspot.com/2006/07/free-beer-for-first-10-cool-rlisp.html
beer ruby rlisp lisp

List of things that suck in Scheme
http://t-a-w.blogspot.com/2006/07/list-of-things-that-suck-in-scheme.html
scheme rant

RLisp gets HTTP support
http://t-a-w.blogspot.com/2006/07/rlisp-gets-http-support.html

OK, so everything's ready and we can send the posts to del.icio.us. Unfortunately the API documentation didn't seem quite right and I kept getting 404 errors. Well, 15 minutes is more or less my attention span, so I just googled for cpan del.icio.us and found Aaron Straup Cope's package.

#!/usr/bin/perl -w
$|=1; # Automatic STDOUT flushing

use Net::Delicious;

@_=<>;
my @p=();
while(@_){
 my $title = shift@_; chomp $title;
 my $url = shift@_; chomp $url;
 my $tags = shift@_; chomp $tags; $tags = "taw blog $tags";
 shift@_;
 push @p, [$title, $url, $tags];
}

my $del = Net::Delicious->new({user=>"taw", pswd=>"password"});
@p = reverse @p;
#while ($p[0][0] ne "Title of the earliest post we want") {shift @p};
for(@p)  {
 my %args = (
   url => $_->[1],
   description => $_->[0],
   tags => $_->[2],
 );
 print "Posting $title: ";
 my $retval = $del->add_post(\%args);
 print $retval, "\n";
 exit unless $retval == 1;
 sleep 5;
}

And new we have easy access to things like list of all blog posts about RLisp.