I'm slowly uploading stuff to the new server after crash.
The thing that slows me down most is trying to get things right immediately instead of doing quick reupload first, and fixing them later. I know that's not the way to get things done fast, but it's more fun.
I want packages to be automatically built and uploaded on a single command (and later, nightly from crontab). Some packages are pretty big (jrpg mostly), so doing this naively would mean unnecessarily reuploading a few hundred MBs every night. The Rakefile needs some way of knowing wheather the new package is any different from the old one.
Unfortunately packages (.tar.gz, .tar.bz2, .zip etc.) with identical contents are not necessarily bitwise identical. In fact, most of the time they're not, and don't even have identical filesizes. So I wrote a library to hash archive contents, which will hopefully save a lot of unnecessary uploads.
The library is pretty simple, so I'm just pasting it here instead of packaging, releasing on RubyForge etc., at least for now.
require 'sha1'
require 'tmpdir'
class Array
 def random
     self[rand(size)]
 end
end
class String
 def digest
     SHA1.hexdigest(self)
 end
 def self.random(len = 32)
     path_characters = ("a".."z").to_a + ("A".."Z").to_a + ("0".."9").to_a + ["_"]
     (0...len).map{ path_characters.random }.join
 end
end
class File
 def self.digest(file_name)
     SHA1.hexdigest(File.read(file_name))
 end
end
class Archive
 def self.finalizer(dir)
     Proc.new{
         system "rm", "-rf", dir
     }
 end
 # file_name must be absolute
 def initialize(file_name, type=nil)
     @file_name = file_name
     type = guess_type_by_extension if type == nil
     @type = type
     @unpacked = false
 end
 # It's not particularly secure
 # Unfortunately tempfile only creates files, not directories
 def dir
     return @dir if @dir
     while true
         @dir = Dir::tmpdir + "/ahash-" + String.random
         Dir.mkdir @dir rescue redo
         ObjectSpace.define_finalizer(self, Archive.finalizer(@dir))
         return @dir
     end
 end
 def guess_type_by_extension
     case @file_name
     when /(\.tgz|\.tar\.gz)\Z/
         :tar_gz
     when /(\.tar\.bz2)\Z/
         :tar_bz2
     when /(\.tar)\Z/
         :tar
     when /(\.zip)\Z/
         :zip
     else
         nil
     end
 end
 def unpack
     return if @unpacked
     Dir.chdir(dir) {
         case @type
         when :tar_gz
             system "tar", "-xzf", @file_name
         when :tar_bz2
             system "tar", "-xjf", @file_name
         when :tar
             system "tar", "-xf", @file_name
         when :zip
             system "unzip", "-q", @file_name
         else
             raise "Don't know how to unpack archives of type #{@type}"
         end
     }
     @unpacked = true
 end
 def quick_hash
     unpack
     @quick_hash ||= Dir.chdir(dir) {
         Dir["**/*"].map{|file_name|
             if File.directory?(file_name)
                 ['dir', file_name]
             else
                 ['file', file_name, File.size(file_name)]
             end
         }.sort.inspect.digest
     }
 end
 def slow_hash
     unpack
     @slow_hash ||= Dir.chdir(dir) {
         Dir["**/*"].map{|file_name|
             if File.directory?(file_name)
                 ['dir', file_name]
             else
                 ['file', file_name, File.size(file_name), File.digest(file_name)]
             end
         }.sort.inspect.digest
     }
 end
endSome details:
Array#randompicks a random array elementString.randompicks a random array elementString#digestreturns SHA1 hash of string in hex formatFile.digest(file_name)returns hex SHA1 hash of contents of filefile_nameArchive.new(file_name, type)createsArchiveobjectArchive.new(file_name)createsArchiveobject and guesses its type (:tar_gz, :tar_bz2, :tar, :zip) based on file extensionArchive#guess_type_by_extensionguessesArchive's type by looking at file extension. (internal function)Archive#dirwhen first run creates temporary directory in/tmp(or system-specific place for temporary files), registers finalizer whichrm -rfs this directory, and returns path to the newly created directory. When run afterwards simply returns the saved path. (internal function)Archive#unpackunpacks contents of the archive to the temporary directory. (internal function)Archive#quick_hashreturns a quick hash, based only on list of files and their sizes, not contents.Archive#slow_hashreturns a reliable but possibly slower hash, based on file list and their contents.
Archive#quick_hash and Archive#slow_hash is that big, as unpacking and hashing take comparable amount of time. On the other hand Archive#quick_hash could easily be computed based on only archive listing (like tar -tvzf), without doing the unpacking, what would make a major difference.


No comments:
Post a Comment