Just about every Unix hacker writes hundreds of small Unix scripts for various tasks for personal use and they very rarely see light of the day - they're too small to turn them into a proper Open Source projects, and in any case it would take a lot of effort.
But these days publishing code on code sharing site like github is just so easy, we should change this. I'm doing my part here - out of hundreds of scripts scattered all over my ~/all repository I took a sample which doesn't depend too much on my personal setup, doesn't break any website's ToS too hard, and generally might be of some use to other people.
Feel free to use these scripts any way you wish. Some of them can be simply used, others can show you how to solve common scripting problems.
All of them have been written by me, except ~/bin/rename by Larry Wall which I'm bundling with the rest for convenience since a lot of Unix boxes don't have it, and it's just ridiculously useful.
These are all meant to work on OSX. Most but not all will work on other Unix distributions. If you have patches to make them work elsewhere, send them via pull request or another convenient method.
All scripts can be accessed in my github repository.
I'll probably be adding more scripts later, but the existing batch is pretty sizable already.
A few tricks
A lot of tricks and good techniques are included in these scripts.SIGPIPE
One such technique - which I don't remember seeing anywhere else is starting the script with trap("PIPE", "EXIT").What does it do? When you setup a pipe of processes like foo | bar | head -n 10, sometimes one of consumer processes will quit first - in this case head after reading 10 lines. The system wants all other processes to exit cleanly so it sends them SIGPIPE signal, which by default kills the rest of the processes in the group.
Ruby decided to override this behaviour, and instead you get an exception. This makes sense for most programs, but for utilities meant to be used as pipe producers (randswap and tac here), it's better to restore default Unix behaviour. Which you can do very easily with this one line.
Avoid shell escaping
An extreme bad scripting practice is generating command string like "rm -rf #{directory}" and executing that via system function.
There's rarely any reason to do so. system takes multiple commands, so you can use safer system "rm", "-rf", directory instead.
It's more complicated to do it "the right way" if you want to setup redirects, get command's output, or pipe multiple processes together, but 90% of the time "the right way" is also the simplest way so why not do it right?
system *%W[]
Ruby %W is an almost unknown feature - I wrote about it some time ago if you want more.Very often, it is the most convenient to execute some command. system *%W[rm -rf #{directory}] looks like it does string interpolation, but it actually evaluates to absolutely safe system("rm", "-rf", "#{directory}") call and you're totally safe regardless of special characters in directory.
Use FileUtils
Usually you don't even have to call system - most common commands for filesystem interactions are available more conveniently via FileUtils module. Like FileUtils.mkdir_p "/some/path", FileUtils.rm_rf "/another/path" etc.If you have to escape shell metacharacters
Sometimes system *%W[] interface is not enough. In such case, just copy and paste String#shell_escape from my scripts (not security-audited or anything):class String
def shell_escape
return "''" if empty?
return dup unless self =~ /[^0-9A-Za-z+,.\/:=@_-]/
gsub(/(')|[^']+/) { $1 ? "\\'" : "'#{$&}'"}
end
end
Then use it like this (without extra ''s): `tar -tzf #{fn.shell_escape}`.
EDIT: Ruby since 1.8.7 added String#shellescape method in shellwords
library in stdlib. So use that unless you need to support older systems.
Individual commands
annotate_sgf
It uses Gnu Go debug mode to annotate your go game in SGF.
It will find a lot of tactical mistakes for most games by kyu players.
Usage:
annotate_sgf game.sgf
Output saved to annotated-game.sgf in the same directory as game.sgf.
For more details, read this blog post.
convert_to_png
Converts various image formats to PNG.
Mostly useful for mass conversion, for example when you have a directory
with 100 svg files dir/file-001.svg to dir/file-100.svg:
convert_to_png dir/*.svg
will convert them all.
dedup_files
Deletes duplicate files in huge directories by hash, with some optimization to avoid unnecessary hashing.Usage:
dedup_files ...
For example:
dedup_files my_little_pony_wallpapers/
which will work pretty well even if you have 100GB of My Little Pony wallpapers.
diffschemas
Gives diff of mysql schemas.
Do dump mysql schema use:
mysqldump -uuser -ppassword -h hostname --where 0=1 database >schema.sql
Then run:
diffschemas schema_1.sql schema_2.sql
which will strip garbage like autoincrement counters and give you clean diff.
e
This utility has extremely short name since it's meant to be used as your primary
way to call text editor.
If you give it a path containing /, or file with such name exists in current directory,
it will call your editor on that file.
Otherwise - it will search your $PATH for this file, and execute your editor on it,
avoiding opening binaries, and other false positives.
This is extremely helpful if you have a ton of scripts you edit a lot.
These two commands achieve similar effect:
mate `which foo`
e foo
except e is shorter, doesn't force you to think about paths,
will expand all symlinks in name (avoiding issues like accidentally editing the
same file under different name in two editor window), and won't accidentally open binaries.
Currently configured to call TextMate of course.
gzip_stream
Pipe through it to gzip log without having infinitely long buffers.
Usage example:
my_server | gzip_stream >log.gz
If you use regular gzip the last few hundred lines will be in memory indefinitely,
so you won't be able to see what's going on in log.gz without killing the server,
even if it happened yesterday. gzip_stream flushes every 5s (easily configurable),
sacrificing tiny amount of compression quality for huge amount of convenience.
Read more about it here.
namenorm
Safely normalizes file names replacing upper case characters and spaces withlower case characters and underlines.
Usage:
namenorm ~/Downloads/*
openmany
Runs open command on multiple files, either as command line arguments,or one-per-line in STDIN.
Usage:
openmany <urls.txt
openmany *.pdf
It uses OSX open command. For Linux edit to use whatever was Linux equivalent.
(I keep forgetting since alias open=... is always in my .bashrc)
pomodoro
Count downs 25 minutes (or however many you specify as command line argument),printing countdown on command line, and when it's over turning volume to maximum
and playing selected sound.
Usage:
pomodoro # 25 minutes
pomodoro 5 # 5 minutes
Read more about Pomodoro Technique on Wikipedia.
Setting volume and playing sound assume OSX commands, but I'm sure you'll be able
to figure out Linux equivalents.
pub
Fixes directory tree by making it publicly readable and editable by you.Very useful when fixing permissions on files you just unpacked from an archive,
since many archive formats store stupid permissions (like read only on directories) inside,
which is a bad idea for everything except backups.
Usage:
pub file.txt
pub directory/
randswap
Randomly swaps lines of STDIN.Usage:
randswap <urls.txt | head -n 10 >sample.txt
rbexe
Creates executable script path with proper #! line and permissions.Defaults to Ruby executable but supports a few other #!s.
Usage:
rbexe file.rb
rbexe --9 file.rb
rbexe --pl file.pl
If file exists, it will only change its permissions without overwriting it,
so it's safe to use.
rename
Larry Wall's rename script, included in Debian-derived distribution, but not on any other UnixI know of - which is literally criminal, since it's one of core Unix utilities.
If your distribution doesn't have it (or worse - has some total crap as rename script),
do yourself a service and install something more sensible, and in the meantime copy this
file to your ~/bin.
split_dir
Splits directories with excessively many files into multiple directories with aboutequal number of about-200 files.
Usage example:
split_dir my_little_pony_wallpapers/
Mostly useful for directories containing images.
strip_9gag
Removes extremely annoying 9gag watermark they put on files they didn't make.Usage examples:
strip_9gag file.jpg
strip_9gag http://some.site.example/file.jpg
tac
Reverses order of lines of whatever is on STDIN, prints to STDOUT.Usage example:
tac <pokemon_by_newest.txt >pokemon_by_oldest.txt
Some distributions already have tac command - for those that don't like OSX, it's really easy to use this replacement.
terminal_title
Changes title of current terminal window. Extremely useful if you have too many terminal titles.
Usage example:
terminal_title 'Production server (do not accidentally killall -9)'; ssh production.server.example
unall
Universal unarchiver. Possibly the most useful nontrivial utility in this repository (not counting Larry Wall's rename).Command like interface to various archives formats is a total failure compared with convenience of desktop programs.
They have huge number of incompatible interfaces, which one can get used to, but there's a much more severe failures - sometimes an archive contains files without a single directory to contain them all.
This problem is solved by most good desktop unarchivers, but in command line world any such archive will ruin your day.
unall fixes all these problems - it checks what's inside the archive, if it's broken archive with multiple files not in same directory it will creature directory for it, if directory already exists it will rename it to something else etc.
If it was successful, it will then delete archive after unpacking (with trash command which puts it into OSX Trash, feel free to change it to whatever your system uses).
Usage:
unall *.zip *.rar *.7z *.tar.bz2 *.tar.gz
unall assumes you have 7za, unrar, and sane version of tar installed.
xmlview
Reindents XML and cuts it to 150 column limit for easy viewig.Usage example:
xmlview huge_machine_generated_xml_file.xml
xnorm
A version of namenorm script which also removes random garbage from file names like ".x264".Useful mostly for TV episodes.
Usage:
xnorm ~/Downloads/*
It's included more as an example than as actually useful utilities since garbage they include in file names changes constantly.
xpstree
A much superior replacement for pstree.Shows directory tree of processes with a lot of garbage cleaned up (like kernel processes removed, scripts displayed by their script name not their interpreter name etc.).
Regexps used to cleanup the tree might require some customization for your situation.
Usage examples:
xpstree
xpstree -u # By current user
xpstree -p # Show pids
xpstree -s # Highlight current process's tree
xpstree -h java # Highlight anything with /java/ in process path
xpstree -s Terminal # Ignore /Terminal/
xpstree -x Terminal # Ignore /Terminal/ and all its children
xpstree -f Terminal # Show only /Terminal/ and all its children
xpstree -h Terminal # Highlight /Terminal/
Lower case options -sxfh are exact match (sane insensitive).
Upper case options -SXFH are regexp match.
xrmdir
Works like rmdir for OSX. Since OSX creates garbage files like .DS_Store in every singledirectory you ever open with Finder (or just because it can), many empty directories
are technically non-empty.
xrmdir deletes this worthless file, then calls rmdir on it.
Usage example:
xrmdir ~/101/reasons/why/osx/sucks/*
Minor point but
ReplyDelete"#{directory}"
is just directory right?
It's directory.to_s actually.
ReplyDeleteIf you try to do something like system("sleep", 5) it will raise exception TypeError: can't convert Fixnum into String.
That's another reason to use system *%W[ ] instead of manually passing all parameters.
There is a shellwords stdlib, which includes a String#shellescape method. Implementation is remarkably similar to yours.
ReplyDeleteJosh: They added it in Ruby 1.8.7, most of this code comes from 1.8.5/1.8.6 times when Shellwords library didn't have escape method. I'll update my code.
ReplyDeleteI usually use *%W[ ] technique anyway, so it took me that long to notice.
Thanks for the dedup script, I have been needing one for my My Little Pony wallpapers, but was too lazy to write it myself.
ReplyDeleteBtw I have always been using the 'zmv' function of zsh in place of that 'rename' script. Dunno which is better, as I don't need to do that very often.
randswap could be replaced with 'sort -R'
ReplyDeleteAnonymous: sort on OSX doesn't accept -R option.
ReplyDelete