$ wget t-a-w.blogspot.com/2006_0{5,6,7}_01_t-a-w_archive.html
Now let's extract the posts. Because they're in reverse chronological order, we want to explicitly specify order of files if we don't want to have a total mess. I use simple regular expression, but the HTML is very nicely tagged, so you can also use a "real" HTML-aware parser:
$ cat 2006_0{7,6,5}* | ruby -e 'STDIN .read .scan( %r[<h3 class="post-title">\s*(.*?)\s*</h3>.*?<a href="(\S+)" title="permanent link">]m) .each{|title, url| print "#{title}\n#{url}\n\n\n"}' >POSTS
Now the POSTS file looks something like this:
$ head POSTS
Free beer for the first 10 cool RLisp programs
http://t-a-w.blogspot.com/2006/07/free-beer-for-first-10-cool-rlisp.html
List of things that suck in Scheme
http://t-a-w.blogspot.com/2006/07/list-of-things-that-suck-in-scheme.html
RLisp gets HTTP support
http://t-a-w.blogspot.com/2006/07/rlisp-gets-http-support.html
We should edit it, by inserting the relevant tags below the URLs:
$ head POSTS
Free beer for the first 10 cool RLisp programs
http://t-a-w.blogspot.com/2006/07/free-beer-for-first-10-cool-rlisp.html
beer ruby rlisp lisp
List of things that suck in Scheme
http://t-a-w.blogspot.com/2006/07/list-of-things-that-suck-in-scheme.html
scheme rant
RLisp gets HTTP support
http://t-a-w.blogspot.com/2006/07/rlisp-gets-http-support.html
OK, so everything's ready and we can send the posts to del.icio.us. Unfortunately the API documentation didn't seem quite right and I kept getting 404 errors. Well, 15 minutes is more or less my attention span, so I just googled for cpan del.icio.us and found Aaron Straup Cope's package.
#!/usr/bin/perl -w
$|=1; # Automatic STDOUT flushing
use Net::Delicious;
@_=<>;
my @p=();
while(@_){
my $title = shift@_; chomp $title;
my $url = shift@_; chomp $url;
my $tags = shift@_; chomp $tags; $tags = "taw blog $tags";
shift@_;
push @p, [$title, $url, $tags];
}
my $del = Net::Delicious->new({user=>"taw", pswd=>"password"});
@p = reverse @p;
#while ($p[0][0] ne "Title of the earliest post we want") {shift @p};
for(@p) {
my %args = (
url => $_->[1],
description => $_->[0],
tags => $_->[2],
);
print "Posting $title: ";
my $retval = $del->add_post(\%args);
print $retval, "\n";
exit unless $retval == 1;
sleep 5;
}
And new we have easy access to things like list of all blog posts about RLisp.
No comments:
Post a Comment