I don't have much experience with blogging websites, but Blogger definitely lacks many useful features, like tags. Nothing is lost, however, we can easily work around it by using del.icio.us. First let's download the archives:
$ wget t-a-w.blogspot.com/2006_0{5,6,7}_01_t-a-w_archive.html
Now let's extract the posts. Because they're in reverse chronological order, we want to explicitly specify order of files if we don't want to have a total mess. I use simple regular expression, but the HTML is very nicely tagged, so you can also use a "real" HTML-aware parser:
$ cat 2006_0{7,6,5}* | ruby -e 'STDIN .read .scan( %r[<h3 class="post-title">\s*(.*?)\s*</h3>.*?<a href="(\S+)" title="permanent link">]m) .each{|title, url| print "#{title}\n#{url}\n\n\n"}' >POSTS
Now the POSTS file looks something like this:
$ head POSTS
Free beer for the first 10 cool RLisp programs
http://t-a-w.blogspot.com/2006/07/free-beer-for-first-10-cool-rlisp.html
List of things that suck in Scheme
http://t-a-w.blogspot.com/2006/07/list-of-things-that-suck-in-scheme.html
RLisp gets HTTP support
http://t-a-w.blogspot.com/2006/07/rlisp-gets-http-support.html
We should edit it, by inserting the relevant tags below the URLs:
$ head POSTS
Free beer for the first 10 cool RLisp programs
http://t-a-w.blogspot.com/2006/07/free-beer-for-first-10-cool-rlisp.html
beer ruby rlisp lisp
List of things that suck in Scheme
http://t-a-w.blogspot.com/2006/07/list-of-things-that-suck-in-scheme.html
scheme rant
RLisp gets HTTP support
http://t-a-w.blogspot.com/2006/07/rlisp-gets-http-support.html
OK, so everything's ready and we can send the posts to del.icio.us. Unfortunately the API documentation didn't seem quite right and I kept getting 404 errors. Well, 15 minutes is more or less my attention span, so I just googled for cpan del.icio.us and found Aaron Straup Cope's package.
#!/usr/bin/perl -w
$|=1; # Automatic STDOUT flushing
use Net::Delicious;
@_=<>;
my @p=();
while(@_){
my $title = shift@_; chomp $title;
my $url = shift@_; chomp $url;
my $tags = shift@_; chomp $tags; $tags = "taw blog $tags";
shift@_;
push @p, [$title, $url, $tags];
}
my $del = Net::Delicious->new({user=>"taw", pswd=>"password"});
@p = reverse @p;
#while ($p[0][0] ne "Title of the earliest post we want") {shift @p};
for(@p) {
my %args = (
url => $_->[1],
description => $_->[0],
tags => $_->[2],
);
print "Posting $title: ";
my $retval = $del->add_post(\%args);
print $retval, "\n";
exit unless $retval == 1;
sleep 5;
}
And new we have easy access to things like list of all blog posts about RLisp.
No comments:
Post a Comment