Saturday, July 21, 2012

Collection of small Unix utilities written in Ruby

Warning!!!...Tiger in training...:O)) by law_keven from flickr (CC-SA)

Just about every Unix hacker writes hundreds of small Unix scripts for various tasks for personal use and they very rarely see light of the day - they're too small to turn them into a proper Open Source projects, and in any case it would take a lot of effort.

But these days publishing code on code sharing site like github is just so easy, we should change this. I'm doing my part here - out of hundreds of scripts scattered all over my ~/all repository I took a sample which doesn't depend too much on my personal setup, doesn't break any website's ToS too hard, and generally might be of some use to other people.

Feel free to use these scripts any way you wish. Some of them can be simply used, others can show you how to solve common scripting problems.

All of them have been written by me, except ~/bin/rename by Larry Wall which I'm bundling with the rest for convenience since a lot of Unix boxes don't have it, and it's just ridiculously useful.

These are all meant to work on OSX. Most but not all will work on other Unix distributions. If you have patches to make them work elsewhere, send them via pull request or another convenient method.

All scripts can be accessed in my github repository.

I'll probably be adding more scripts later, but the existing batch is pretty sizable already.

A few tricks

A lot of tricks and good techniques are included in these scripts.

SIGPIPE

One such technique - which I don't remember seeing anywhere else is starting the script with trap("PIPE", "EXIT").

What does it do? When you setup a pipe of processes like foo | bar | head -n 10, sometimes one of consumer processes will quit first - in this case head after reading 10 lines. The system wants all other processes to exit cleanly so it sends them SIGPIPE signal, which by default kills the rest of the processes in the group.

Ruby decided to override this behaviour, and instead you get an exception. This makes sense for most programs, but for utilities meant to be used as pipe producers (randswap and tac here), it's better to restore default Unix behaviour. Which you can do very easily with this one line.

Avoid shell escaping


An extreme bad scripting practice is generating command string like "rm -rf #{directory}" and executing that via system function.

There's rarely any reason to do so. system takes multiple commands, so you can use safer system "rm", "-rf", directory instead.

It's more complicated to do it "the right way" if you want to setup redirects, get command's output, or pipe multiple processes together, but 90% of the time "the right way" is also the simplest way so why not do it right?

system *%W[]

Ruby %W is an almost unknown feature - I wrote about it some time ago if you want more.

Very often, it is the most convenient to execute some command. system *%W[rm -rf #{directory}] looks like it does string interpolation, but it actually evaluates to absolutely safe system("rm", "-rf", "#{directory}") call and you're totally safe regardless of special characters in directory.

Use FileUtils

Usually you don't even have to call system - most common commands for filesystem interactions are available more conveniently via FileUtils module. Like FileUtils.mkdir_p "/some/path", FileUtils.rm_rf "/another/path" etc.

If you have to escape shell metacharacters

Sometimes system *%W[] interface is not enough. In such case, just copy and paste String#shell_escape from my scripts (not security-audited or anything):


class String
  def shell_escape
    return "''" if empty?
    return dup unless self =~ /[^0-9A-Za-z+,.\/:=@_-]/
    gsub(/(')|[^']+/) { $1 ? "\\'" : "'#{$&}'"}
  end
end

Then use it like this (without extra ''s): `tar -tzf #{fn.shell_escape}`.

EDIT: Ruby since 1.8.7 added String#shellescape method in shellwords
 library in stdlib. So use that unless you need to support older systems.
Playing white tiger cub by Tambako the Jaguar from flickr (CC-ND)

Individual commands

annotate_sgf


It uses Gnu Go debug mode to annotate your go game in SGF.
It will find a lot of tactical mistakes for most games by kyu players.

Usage:

annotate_sgf game.sgf


Output saved to annotated-game.sgf in the same directory as game.sgf.

For more details, read this blog post.

convert_to_png


Converts various image formats to PNG.
Mostly useful for mass conversion, for example when you have a directory
with 100 svg files dir/file-001.svg to dir/file-100.svg:

    convert_to_png dir/*.svg


will convert them all.

dedup_files

Deletes duplicate files in huge directories by hash, with some optimization to avoid unnecessary hashing.

Usage:

    dedup_files   ...

   
For example:

    dedup_files my_little_pony_wallpapers/


which will work pretty well even if you have 100GB of My Little Pony wallpapers.

diffschemas


Gives diff of mysql schemas.

Do dump mysql schema use:

    mysqldump -uuser -ppassword -h hostname --where 0=1 database >schema.sql


Then run:

    diffschemas schema_1.sql schema_2.sql

which will strip garbage like autoincrement counters and give you clean diff.

e


This utility has extremely short name since it's meant to be used as your primary
way to call text editor.

If you give it a path containing /, or file with such name exists in current directory,
it will call your editor on that file.

Otherwise - it will search your $PATH for this file, and execute your editor on it,
avoiding opening binaries, and other false positives.

This is extremely helpful if you have a ton of scripts you edit a lot.

These two commands achieve similar effect:

    mate `which foo`

    e foo

except e is shorter, doesn't force you to think about paths,
will expand all symlinks in name (avoiding issues like accidentally editing the
same file under different name in two editor window), and won't accidentally open binaries.

Currently configured to call TextMate of course.

gzip_stream


Pipe through it to gzip log without having infinitely long buffers.

Usage example:

    my_server | gzip_stream >log.gz

If you use regular gzip the last few hundred lines will be in memory indefinitely,
so you won't be able to see what's going on in log.gz without killing the server,
even if it happened yesterday. gzip_stream flushes every 5s (easily configurable),
sacrificing tiny amount of compression quality for huge amount of convenience.

Read more about it here.

namenorm

Safely normalizes file names replacing upper case characters and spaces with
lower case characters and underlines.

Usage:
    namenorm ~/Downloads/*

openmany

Runs open command on multiple files, either as command line arguments,
or one-per-line in STDIN.


Usage:
    openmany <urls.txt
    openmany *.pdf

It uses OSX open command. For Linux edit to use whatever was Linux equivalent.
(I keep forgetting since alias open=... is always in my .bashrc)

pomodoro

Count downs 25 minutes (or however many you specify as command line argument),
printing countdown on command line, and when it's over turning volume to maximum
and playing selected sound.

Usage:
    pomodoro   # 25 minutes

    pomodoro 5 # 5 minutes


Read more about Pomodoro Technique on Wikipedia.

Setting volume and playing sound assume OSX commands, but I'm sure you'll be able
to figure out Linux equivalents.

pub

Fixes directory tree by making it publicly readable and editable by you.

Very useful when fixing permissions on files you just unpacked from an archive,
since many archive formats store stupid permissions (like read only on directories) inside,
which is a bad idea for everything except backups.

Usage:

    pub file.txt

    pub directory/



randswap

Randomly swaps lines of STDIN.

Usage:

    randswap <urls.txt | head -n 10 >sample.txt

rbexe

Creates executable script path with proper #! line and permissions.

Defaults to Ruby executable but supports a few other #!s.

Usage:

    rbexe file.rb

    rbexe --9 file.rb

    rbexe --pl file.pl

 
If file exists, it will only change its permissions without overwriting it,
so it's safe to use.

rename

Larry Wall's rename script, included in Debian-derived distribution, but not on any other Unix
I know of - which is literally criminal, since it's one of core Unix utilities.

If your distribution doesn't have it (or worse - has some total crap as rename script),
do yourself a service and install something more sensible, and in the meantime copy this
file to your ~/bin.

split_dir

Splits directories with excessively many files into multiple directories with about
equal number of about-200 files.

Usage example:

    split_dir my_little_pony_wallpapers/


Mostly useful for directories containing images.

strip_9gag

Removes extremely annoying 9gag watermark they put on files they didn't make.

Usage examples:

    strip_9gag file.jpg

    strip_9gag http://some.site.example/file.jpg

tac

Reverses order of lines of whatever is on STDIN, prints to STDOUT.

Usage example:

    tac <pokemon_by_newest.txt >pokemon_by_oldest.txt

Some distributions already have tac command - for those that don't like OSX, it's really easy to use this replacement.

terminal_title


Changes title of current terminal window. Extremely useful if you have too many terminal titles.

Usage example:

    terminal_title 'Production server (do not accidentally killall -9)'; ssh production.server.example

unall

Universal unarchiver. Possibly the most useful nontrivial utility in this repository (not counting Larry Wall's rename).

Command like interface to various archives formats is a total failure compared with convenience of desktop programs.

They have huge number of incompatible interfaces, which one can get used to, but there's a much more severe failures - sometimes an archive contains files without a single directory to contain them all.
This problem is solved by most good desktop unarchivers, but in command line world any such archive will ruin your day.

unall fixes all these problems - it checks what's inside the archive, if it's broken archive with multiple files not in same directory it will creature directory for it, if directory already exists it will rename it to something else etc.

If it was successful, it will then delete archive after unpacking (with trash command which puts it into OSX Trash, feel free to change it to whatever your system uses).

Usage:
    unall *.zip *.rar *.7z *.tar.bz2 *.tar.gz


unall assumes you have 7za, unrar, and sane version of tar installed.

xmlview

Reindents XML and cuts it to 150 column limit for easy viewig.

Usage example:

    xmlview huge_machine_generated_xml_file.xml

xnorm

A version of namenorm script which also removes random garbage from file names like ".x264".
Useful mostly for TV episodes.

Usage:

    xnorm ~/Downloads/*


It's included more as an example than as actually useful utilities since garbage they include in file names changes constantly.

xpstree

A much superior replacement for pstree.

Shows directory tree of processes with a lot of garbage cleaned up (like kernel processes removed, scripts displayed by their script name not their interpreter name etc.).

Regexps used to cleanup the tree might require some customization for your situation.

Usage examples:
    xpstree

    xpstree -u          # By current user

    xpstree -p          # Show pids

    xpstree -s          # Highlight current process's tree

    xpstree -h java     # Highlight anything with /java/ in process path

    xpstree -s Terminal # Ignore /Terminal/

    xpstree -x Terminal # Ignore /Terminal/ and all its children

    xpstree -f Terminal # Show only /Terminal/ and all its children

    xpstree -h Terminal # Highlight /Terminal/

Lower case options -sxfh are exact match (sane insensitive).

Upper case options -SXFH are regexp match.

xrmdir

Works like rmdir for OSX. Since OSX creates garbage files like .DS_Store in every single
directory you ever open with Finder (or just because it can), many empty directories
are technically non-empty.

xrmdir deletes this worthless file, then calls rmdir on it.

Usage example:

    xrmdir ~/101/reasons/why/osx/sucks/*

7 comments:

  1. Minor point but
    "#{directory}"
    is just directory right?

    ReplyDelete
  2. It's directory.to_s actually.

    If you try to do something like system("sleep", 5) it will raise exception TypeError: can't convert Fixnum into String.

    That's another reason to use system *%W[ ] instead of manually passing all parameters.

    ReplyDelete
  3. There is a shellwords stdlib, which includes a String#shellescape method. Implementation is remarkably similar to yours.

    ReplyDelete
  4. Josh: They added it in Ruby 1.8.7, most of this code comes from 1.8.5/1.8.6 times when Shellwords library didn't have escape method. I'll update my code.

    I usually use *%W[ ] technique anyway, so it took me that long to notice.

    ReplyDelete
  5. Anonymous12:35

    Thanks for the dedup script, I have been needing one for my My Little Pony wallpapers, but was too lazy to write it myself.

    Btw I have always been using the 'zmv' function of zsh in place of that 'rename' script. Dunno which is better, as I don't need to do that very often.

    ReplyDelete
  6. Anonymous09:29

    randswap could be replaced with 'sort -R'

    ReplyDelete
  7. Anonymous: sort on OSX doesn't accept -R option.

    ReplyDelete