taw's blog: More small Unix utilities written in Ruby

Thursday, April 11, 2013

More small Unix utilities written in Ruby

eating grass 2 by lotusgreen from flickr (CC-NC-SA)

Here's the sequel to my "collection of small Unix utilities written in Ruby" post and github repository.

Useful technique - `Pathname`

One thing I forgot to mention the last time - Pathname library.

Pathname is an objects-oriented way to look at paths in a file system. A Pathname object is not the same as a File or Directory object since it's not opened - and might not even exist yet. It's also not like String since it has all the filesystem awareness.

For very simple scripts it's fine to use just plain Strings to represent filesystem paths, but once it gets a bit more complicated your script will get a lot more readable with Pathname - and it costs you nothing.

Let's just look at fix_permissions utility. Here's the core part:

class Pathname
  def script?
    read(2) == "#!"
  end

  def file_type
    `file -b #{self.to_s.shellescape}`.chomp
  end

  def should_be_executable?
    script? or file_type =~ /\b(Mach-O|executable)\b/
  end
end

def fix_permissions(path)
  Pathname(path).find do |fn|
    next if fn.directory?
    next if fn.symlink?
    next unless fn.executable?
    fn.chmod(0644) unless fn.should_be_executable?
  end
end

Since Pathname overloads #to_str method it can be transparently used in most contexts where String is expected - including printing it, file operations, system/exec commands and so on. You'll rarely need to use #to_s - mostly when you want to regexp it.

I feel Pathname#shellescape should exist, but since it doesn't that's one place where you need to use .to_s.shellescape for now.

So what does this script do? First we add a few methods to Pathname class. It already knows if something is a directory?, symlink?, and executable? (that is - has +x flag).

We want to know if it is a script. And that's easy - just read(2) as if it was a File to read first two bytes. It looks much more elegant than File.read(path, 2) != "#!" we'd need if we used Strings - not to mention how String class is really no place for #script? method so we'd probably use a standalone procedure.

Next let's make file_type method - and use #shellescape to do it safely. Unfortunately that one is only defined on Strings.

After that it's just one regexp away from should_be_executable?.

Once we defined that notice how easy it is to dig into directory trees with Pathname#find, and then just use a few #query? methods to ask the path what it is about, then #chmod to setup proper flags.

Other very useful methods not present in the script are + for adding relative paths, #basename/#dirname for splitting it into components, and #relative_path_from for creating relative paths.

While I'm at it, use URI objects for URIs you want to do something complicated with rather than regexping them - usually your code will look better too.

Individual commands

colcut

Cuts long lines to specific number of characters for easy previewing.

    colcut 80 < file.xml

fix_permissions

Removes executable flag from files which shouldn't have it. Useful for archives that went through a Windows system, zip archive, or other system not aware of Unix executable flag.

It doesn't turn +x flag, only removes it if a file neither starts with #!, nor is an executable according to file utility.

Usage example:

    fix_permissions ~/Downloads

If no parameters are passed, it fixes permissions in current directory.

progress

Display progress for piped file.

Usage examples:

       cat /dev/urandom | progress | gzip  >/dev/null
       progress -l <file.txt | upload

By default it's in bytes mode. Use -l to specify line mode.

If progress is piped a file and it's in byte mode, it checks its size and uses that to display relative progress (like 18628608/104857600 [17%]). Otherwise it will only display number of bytes/lines piped through.

You can also specify what counts as 100% explicitly:

     progesss 123456
     progress 128m
     progress -l 42042

It will happily go over 100% on display.

since_soup

Link to soup posts starting from the post before one specified.

Usage example:

    since_soup http://taw.soup.io/post/307955954/Image

sortby

Sort input through arbitrary Ruby expression. A lot more flexible than Unix sort utility.

Usage example:

    sortby '$_.length' <file.txt

5 comments:

Unknown said...: The pre-existing 'fmt' command does what your 'colcut' does and more and is likely much faster.; 17:41
taw said...: Unknown: They do different things - fmt reformats, I just brutally chop excess. It's mostly usable for reading machine-generated xml and json files after putting them through auto-indentation.; 17:55
Unknown said...: This comment has been removed by the author.; 11:12
Unknown said...: cut(1) does this as well. It does both characters (cut -c1-80) and fields (cut -f2,3).

Nice series of tools though :); 11:14
taw said...: Julien: I guess I reimplemented that for no good reason then ;-)

It's not really surprising, often it's easier to write a Ruby or Perl one-liner than to find options for existing command to do the same thing.; 11:51