This is getting quite ridiculous, as it's the third post I'm writing in just one day - that's more than I typically wrote monthly over the last two years. This is what you get for finally getting organized - a big pile of 90%-finished stuff which while useless on their own can be very quickly be turned into something of high utility. OmniFocus and GTD are truly awesome. (unfortunately this means there are now two apps for Mac I care about - TextMate and OmniFocus - so switching away from Mac will be even hander)
The script I want to show you today solves one of the most severe problems with Unix pipes - lack of progress indicators. Normally you'd start a pipe, and until it actually finished what it was doing, you'd have no way of finding out if it's 1% done or 99% done, and how fast it is progressing.
There are some nasty hacks. Many times I used strace or looked inside /proc to figure out how much progress has been done - but these are painful waste of effort for something that should be builtin. Feedback is the basic principle of good UI design.
Anyway, here's the final solution to progress bar question:
#!/usr/bin/env ruby
STDERR.sync = true
$bytes = true
$max = nil
$count = 0
until ARGV.empty?
case (arg = ARGV.shift)
when '-l'
$bytes = false
when '-b'
$bytes = true
when /\A(\d+)([kmg]?)\Z/
units = {'k'=>2**10, 'm'=>2**20, 'g'=>2**10, ''=>1}
$max = $1.to_i * units[$2]
else
raise "Unrecognized argument: `#{arg}'"
end
end
$max = STDIN.stat.size if $bytes and STDIN.stat.file? and $max.nil?
Thread.new{
last_count = nil
while true
if $count != last_count
if $max
STDERR.print "\r#{$count}/#{$max} [#{$count*100/$max}%]"
else
STDERR.print "\r#{$count}"
end
last_count = $count
end
sleep 1
end
}
begin
while data = ($bytes ? STDIN.read(2**12) : STDIN.gets)
STDOUT.print(data)
$count += $bytes ? data.length : 1
end
STDERR.print "\n"
rescue Errno::EPIPE
end
Explanation time:
- You can use this script at any point of the pipeline. foo | bar | progress | blah.
- The script will take advantage of the fact that its STDERR is still linked with your terminal, and output progress information there. It will then clear and overwrite the same line every second with new information.
- There is no support for multiple progressbars in one pipeline - his is not terribly difficult to do (one would be master, others would send info to it via socket based on tty's inode), but I never found a good use case for it, so I never bothered implementing it.
- progress script works in two modes - by default it counts bytes (-b), but it can count lines (-l) as well.
- You can specify what counts as 100% if you want percentage information - with progress -l 1234, progress -b 700m etc. If you specify wrong size of course you get garbage.
- If you operate in the default byte mode and input is a file - the script will figure out file size automatically. This doesn't happen in line mode, as it would require a potentially expensive wc -l - it's easy to do it manually if you want.
$ ./progress < kubuntu-10.04-beta2-desktop-amd64.iso | openssl md5
287244288/708704256 [40%]
Enjoy.
You just re-implemented pv.
ReplyDeleteAnonymous: It seems I just did. I've never heard of pv before. As my excuse I'll say that my script is tiny and easily hackable for whatever kind of progress indicators you wish to use, while pv is an ugly pile of C code I wouldn't touch with a stick.
ReplyDeleteNeeds moar strace.
ReplyDeletehttp://chris-lamb.co.uk/2008/01/24/can-you-get-cp-to-give-a-progress-bar-like-wget/
Enjoying the new frequency of posts. :)
http://www.catonmat.net/blog/unix-utilities-pipe-viewer/
ReplyDelete"pv is an ugly pile of C code I wouldn't touch with a stick"
ReplyDeleteAfter such a statement, I just had to have a look at the source, and I have to say, pv has the most readable, concisely-but-well-commented C code I've read in a looong while. It even comes with set of unit tests!
I'll agree that your script is much more easily modifiable, being small and in Python, but I see no reason to call pv's code ugly.