The best kittens, technology, and video games blog in the world.

Thursday, April 11, 2013

More small Unix utilities written in Ruby

eating grass 2 by lotusgreen from flickr (CC-NC-SA)

Here's the sequel to my "collection of small Unix utilities written in Ruby" post and github repository.

Useful technique - Pathname

One thing I forgot to mention the last time - Pathname library.

Pathname is an objects-oriented way to look at paths in a file system. A Pathname object is not the same as a File or Directory object since it's not opened - and might not even exist yet. It's also not like String since it has all the filesystem awareness.

For very simple scripts it's fine to use just plain Strings to represent filesystem paths, but once it gets a bit more complicated your script will get a lot more readable with Pathname - and it costs you nothing.

Let's just look at fix_permissions utility. Here's the core part:

class Pathname
  def script?
    read(2) == "#!"

  def file_type
    `file -b #{self.to_s.shellescape}`.chomp

  def should_be_executable?
    script? or file_type =~ /\b(Mach-O|executable)\b/

def fix_permissions(path)
  Pathname(path).find do |fn|
    next if
    next if fn.symlink?
    next unless fn.executable?
    fn.chmod(0644) unless fn.should_be_executable?

Since Pathname overloads #to_str method it can be transparently used in most contexts where String is expected - including printing it, file operations, system/exec commands and so on. You'll rarely need to use #to_s - mostly when you want to regexp it.

I feel Pathname#shellescape should exist, but since it doesn't that's one place where you need to use .to_s.shellescape for now.

So what does this script do? First we add a few methods to Pathname class. It already knows if something is a directory?, symlink?, and executable? (that is - has +x flag).

We want to know if it is a script. And that's easy - just read(2) as if it was a File to read first two bytes. It looks much more elegant than, 2) != "#!" we'd need if we used Strings - not to mention how String class is really no place for #script? method so we'd probably use a standalone procedure.

Next let's make file_type method - and use #shellescape to do it safely. Unfortunately that one is only defined on Strings.

After that it's just one regexp away from should_be_executable?.

Once we defined that notice how easy it is to dig into directory trees with Pathname#find, and then just use a few #query? methods to ask the path what it is about, then #chmod to setup proper flags.

Other very useful methods not present in the script are + for adding relative paths, #basename/#dirname for splitting it into components, and #relative_path_from for creating relative paths.

While I'm at it, use URI objects for URIs you want to do something complicated with rather than regexping them - usually your code will look better too.

Individual commands


Cuts long lines to specific number of characters for easy previewing.

    colcut 80 < file.xml


Removes executable flag from files which shouldn't have it. Useful for archives that went through a Windows system, zip archive, or other system not aware of Unix executable flag.

It doesn't turn +x flag, only removes it if a file neither starts with #!, nor is an executable according to file utility.

Usage example:
    fix_permissions ~/Downloads
If no parameters are passed, it fixes permissions in current directory.


Display progress for piped file.

Usage examples:

       cat /dev/urandom | progress | gzip  >/dev/null
       progress -l <file.txt | upload

By default it's in bytes mode. Use -l to specify line mode.

If progress is piped a file and it's in byte mode, it checks its size and uses that to display relative progress (like 18628608/104857600 [17%]). Otherwise it will only display number of bytes/lines piped through.

You can also specify what counts as 100% explicitly:

     progesss 123456
     progress 128m
     progress -l 42042

It will happily go over 100% on display.


Link to soup posts starting from the post before one specified.

Usage example:


Sort input through arbitrary Ruby expression. A lot more flexible than Unix sort utility.

Usage example:
    sortby '$_.length' <file.txt


Unknown said...

The pre-existing 'fmt' command does what your 'colcut' does and more and is likely much faster.

taw said...

Unknown: They do different things - fmt reformats, I just brutally chop excess. It's mostly usable for reading machine-generated xml and json files after putting them through auto-indentation.

Julien Letessier said...
This comment has been removed by the author.
Julien Letessier said...

cut(1) does this as well. It does both characters (cut -c1-80) and fields (cut -f2,3).

Nice series of tools though :)

taw said...

Julien: I guess I reimplemented that for no good reason then ;-)

It's not really surprising, often it's easier to write a Ruby or Perl one-liner than to find options for existing command to do the same thing.