The best kittens, technology, and video games blog in the world.

Thursday, May 06, 2010

Progress bar for Unix pipes

Don't Spill! by Ack Ook from flickr (CC-SA)


This is getting quite ridiculous, as it's the third post I'm writing in just one day - that's more than I typically wrote monthly over the last two years. This is what you get for finally getting organized - a big pile of 90%-finished stuff which while useless on their own can be very quickly be turned into something of high utility. OmniFocus and GTD are truly awesome. (unfortunately this means there are now two apps for Mac I care about - TextMate and OmniFocus - so switching away from Mac will be even hander)

The script I want to show you today solves one of the most severe problems with Unix pipes - lack of progress indicators. Normally you'd start a pipe, and until it actually finished what it was doing, you'd have no way of finding out if it's 1% done or 99% done, and how fast it is progressing.

There are some nasty hacks. Many times I used strace or looked inside /proc to figure out how much progress has been done - but these are painful waste of effort for something that should be builtin. Feedback is the basic principle of good UI design.

Anyway, here's the final solution to progress bar question:

#!/usr/bin/env ruby

STDERR.sync = true

$bytes = true
$max = nil
$count = 0

until ARGV.empty?
  case (arg = ARGV.shift)
  when '-l'
    $bytes = false
  when '-b'
    $bytes = true
  when /\A(\d+)([kmg]?)\Z/
    units = {'k'=>2**10, 'm'=>2**20, 'g'=>2**10, ''=>1}
    $max = $1.to_i * units[$2]
  else
    raise "Unrecognized argument: `#{arg}'"
  end
end

$max = STDIN.stat.size if $bytes and STDIN.stat.file? and $max.nil?

Thread.new{
  last_count = nil
  while true
    if $count != last_count
      if $max
        STDERR.print "\r#{$count}/#{$max} [#{$count*100/$max}%]"
      else
        STDERR.print "\r#{$count}"
      end
      last_count = $count
    end
    sleep 1
  end  
}

begin
  while data = ($bytes ? STDIN.read(2**12) : STDIN.gets)
    STDOUT.print(data)
    $count += $bytes ? data.length : 1
  end
  STDERR.print "\n"
rescue Errno::EPIPE
end

Explanation time:
  • You can use this script at any point of the pipeline. foo | bar | progress | blah.
  • The script will take advantage of the fact that its STDERR is still linked with your terminal, and output progress information there. It will then clear and overwrite the same line every second with new information.
  • There is no support for multiple progressbars in one pipeline - his is not terribly difficult to do (one would be master, others would send info to it via socket based on tty's inode), but I never found a good use case for it, so I never bothered implementing it.
  • progress script works in two modes - by default it counts bytes (-b), but it can count lines (-l) as well.
  • You can specify what counts as 100% if you want percentage information - with progress -l 1234, progress -b 700m etc. If you specify wrong size of course you get garbage.
  • If you operate in the default byte mode and input is a file - the script will figure out file size automatically. This doesn't happen in line mode, as it would require a potentially expensive wc -l - it's easy to do it manually if you want.
Here's a "screenshot":
$ ./progress < kubuntu-10.04-beta2-desktop-amd64.iso | openssl md5
287244288/708704256 [40%]

Enjoy.

5 comments:

Anonymous said...

You just re-implemented pv.

taw said...

Anonymous: It seems I just did. I've never heard of pv before. As my excuse I'll say that my script is tiny and easily hackable for whatever kind of progress indicators you wish to use, while pv is an ugly pile of C code I wouldn't touch with a stick.

lamby said...

Needs moar strace.

http://chris-lamb.co.uk/2008/01/24/can-you-get-cp-to-give-a-progress-bar-like-wget/

Enjoying the new frequency of posts. :)

Anonymous said...

http://www.catonmat.net/blog/unix-utilities-pipe-viewer/

Anonymous said...

"pv is an ugly pile of C code I wouldn't touch with a stick"

After such a statement, I just had to have a look at the source, and I have to say, pv has the most readable, concisely-but-well-commented C code I've read in a looong while. It even comes with set of unit tests!

I'll agree that your script is much more easily modifiable, being small and in Python, but I see no reason to call pv's code ugly.