The best kittens, technology, and video games blog in the world.

Saturday, March 27, 2010

Small tips for making Unix programming nicer

Koma is a Mac Addict by Pixteca MX【ツ】 from flickr (CC-BY)

Every Unix programmer - and I tell it authoritatively based on my statistically significant sample of 1 - creates millions of tiny scripts that he never bothers publishing. Because they're small, so overhead of cleaning up, documenting, and publishing such script would be immense. And not obviously googleable/bingable/baiduable or whatever people call it these days - it might surprise you but most things in the world are not described by a few well-defined key phrases.

I decided to get a few of such scripts, and throw a bunch of tips on top of them - hopefully you'll find a useful trick or two here.

How to avoid Unix destroying your files

First, we need to fix some of the famous Unix brain damage - cp, mv, and bash overwriting your files.
Now a case could be made for rm removing files without asking for confirmation - that's what rm stands for.
But how in the world is this:
mv most_awesome_song_ever.mp3 ~/Music/
supposed to quietly overwrite existing ~/Music/most_awesome_song_ever.mp3?
In 99% of cases this is behaviour you do not want - and if you actually do,
you know to type this instead:
mv -f most_awesome_song_ever.mp3 ~/Music/

So open your ~/.bashrc and put these commands there (at least ones for mv and cp):
alias rm='rm -i'
alias mv='mv -i'
alias cp='cp -i'

There's one more way Unix can destroy your files - command >file redirection will overwrite file without asking, so it's pretty easy for accidents this way. You really don't want
cat >~/notes/girlfriend
to lose information about all your exes, do you? (example purely speculative) Get yourself into habit of always saying command >>file - appending to file. 99% of time the file in question won't exist - so the result will be the same - and in cases where file exists, you remove it first. If you make a mistake, you just ^C and you're back when you started. I haven't used overwriting redirection is years - it's completely purged from my Unix dictionary. Do the same thing.

By the way some Unix distributions do these by default. And some shells have options for asking for confirmation if > would override files. Do these anyway, just to be sure.

Make shell history useful

By default shell history stores entire 500 entries - a sensible decision back when you had 4MB of RAM, most of which taken by Emacs. It's just ridiculous these days so as first line of .bashrc put export HISTSIZE=1000000. It must be first, because if bash ever decides to exit without HISTSIZE set to a sane value, it might decide to trim your history file and lose all your history. Of course if you want to tempt the fate...

Get rid of .pyc file everywhere

Python's habit of creating .pyc files all over the place gets on my nerves - and it's dubious that they really improve performance that much. If they want to fix performance, how about dealing with GIL first instead of resorting to such hacks? Anyway export PYTHONDONTWRITEBYTECODE=1 totally fixes the problem without any side effects.

Saner output from Unix commands

Compared to fatal loss of data, these are just minor annoyances, but they're really easy to fix, so let's do it.
For stupid reasons du and df commands give sizes in units of 512 bytes or something like that.
Probably some ancient BSD file system allocated files in multiplies of that. That they care more about file system
implementation details than about human usability tells you something about the Unix mindset (if these were at least kBs
that would make sense but no...) - and in any case these assumptions are no longer true on modern operating systems,
which don't rely so mindlessly on blocks. So add these two lines to get human-readable output:
alias df='df -h'
alias du='du -h'

By the way alias command only applies to what you type in the terminal, not what scripts do. So if you run mv or df from scripts, it will not have this default -i/-h. Be cautious.

FELEK - My Home

Final touches for .bashrc

If some commands require root access, and you're tired of typing sudo this and sudo that, just add a few of:
alias port='sudo port'
alias gem='sudo gem'
alias apt-get='sudo apt-get'
alias reboot='sudo reboot'
to your .bashrc. They only save you a little typing, and don't chance anything about security (you still need your admin password etc.), but why type more if you can type less.

Colorful shell

Depending on your distribution you might already have colors in your shell or not.
export CLICOLOR=1 in .bashrc will convince many commands that you want nicely colored output.
To tell that to git, you need to add to your .gitconfig:
 diff = auto
 status = auto
 branch = auto

And now your Unix comes with more rainbows.

GNU grep must die

Never use GNU grep. Use pcregrep for everything - or rak/ack if you want coloring, automatic .git directory skipping etc. Unfortunately pcregrep is ridiculously slow to type so do yourself a favour and add
alias gr='pcregrep'
to make your life even easier.

wget HTTPS nonsense

I don't know if it's just MacPorts' version of wget, or is it universal, but it seems to miss all HTTPS root CAs (take that VeriSign!). Another alias solves the problem. By "solves" I mean it opens a massive security vulnerability, but we already know that CAs will create fake certs for NSA, Mossad, RIAA, and Hackney Borough Council if asked, so the vulnerability is much less than it seems at first.
alias wget='wget --no-check-certificate'

Use ruby or perl for nontrivial actions

In early 1980s, before Perl got invented to solve exactly this problem I talk about,
people would write insanely complicated shell scripts to automate their Unix actions.

Unfortunately you cannot really serve two aims at once - being highly accepting for casual real time input,
and being highly robust for programmable interfaces - so shell sucks at both. Fortunately the problem
is solved since December 18, 1987 when Perl got invented just for this reason.

And yet - many people act as if 12/18 never happened. Wake up sheeple!

curious sheeple by ztephen from flickr (CC-NC-SA)

Shell is ridiculously stupid. It has arcane and fragile escaping rules.
It cannot even reliably expand a file list:
$ ls ~/porn/*.jpg
-bash: /bin/ls: Argument list too long

How useless is that? And don't even get me started with xargs, find, awk and the rest horrible mess.

How do you find top largest MP3 files - in any subdirectory?
ruby -e 'Dir["**/*.mp3"].sort_by{|fn| -File.size(fn)}[0,10].each{|fn| system "ls", "-l", fn}'
And yes, I'm calling ls 10 times here, as otherwise it would see fit to rearrange them alphabetically just for lulz.
This Ruby code is really easy and really obvious.

What would be shell solution? Something like this:
find . -name '*.mp3' -print0 | xargs -0  ls -l | sort -k25 -rn | head -n 10
You need find and xargs to avoid "argument list too long" error, then you need -print0 and -0 because file names can contain single quotes or - longcat forbid - spaces! Then you need to manually count at which position of ls -l's output is file name (conveniently ls -l uses something close enough to fixed column width to make sort work - otherwise you'd have to do some heavy awking around it), then you finally head. Personally I prefer waterboarding to having my brain suffer any of this.

Learn GUI integration basics

The chasm between nice programmable world of Unix terminals and hostile world of closed GUI programs can be to some extend lessened.

If you use KDE, dcop command gives you decent level of control over GUI programs and you can explore the interface from command line.

On OSX there's a convenient open command for opening URLs and files with the most sensible program. And osascript command, which is about as powerful as KDE's dcop except far more painful to use by trying to be "friendly" too hard, resulting in unsurprising failure. Google will help.

Editing your scripts made easy

You probably have five billion scripts in your path (you have ~/bin or ~/local/bin or ~/gitrepo/bin or such in your $PATH right?) - and you're tweaking them all the time.
Typing mate `which some_script.rb` takes forever and is not easily tabbable (Ubuntu has really good
bash autocompletion package which might alleviate this problem a lot - but most distros don't).

Wouldn't it be easier to just say some_script.rb --edit? It would also be far easier to type - somTAB --edit. It's really easy. Just put this below the shebang line of all your scripts:

# Ruby
exec 'mate', __FILE__ if ARGV[0] == '--edit'

# Perl
exec "mate", __FILE__ if $ARGV[0] eq '--edit';

# Python
import os, sys
if sys.argv[1:2] == ['--edit']: os.execlp("mate", "mate", __file__)

If someone passed --edit as first argument it will start the editor instead of running the script - otherwise it will not affect it in any way.

Python code is fairly painful because Python decided to keep low level C interface to exec* instead of providing sane Perl-style interface. And you know something is wrong if Perl is described as "sane" compared with you.

Feel free to figure out how to get this effect with C++.

PRRRRRRR!!! by milky.way from flickr (CC-NC-ND)

Find kittens for your blog

I only want CC kittens, so nobody sues my blog. Except for defamation, that I don't mind. Here's the script

#!/usr/bin/env ruby
uri = "{ARGV.join '+'}&l=cc&ss=2&ct=0&mt=all&adv=1&s=int"
# On OSX
system "open", uri
# On Linux
#system "firefox", "-new-tab", uri

By the way could someone get Linux distros to copy open command? It's really simple and really useful.

That's it for today. Enjoy your Unix.


Divided Mind said...

Note that DCOP is for KDE3. KDE4 uses D-Bus, and Qt4 provides a convenient interface reminiscent of the dcop command line tool, named qdbus. It also provides qdbusviewer for conveniently navigating the tree with a GUI so you can nail down the single method you need in the whole huge tree.

BTW, KDE (4) command equivalent to OSX open is kfmclient openURL.

taw said...

Divided Mind: As you can tell, I haven't used KDE in a while ;-)

open can do a bit more than opening URLs like:

open file.png

open -a /Applications/ file.png # equivalent of drag-n-dropping file.png onto

I'm sure other systems have this too - my point is - it's worth taking a minute and learning such tricks

Divided Mind said...

Oh, then I guess kioclient exec would be a better match for open.

BTW, --edit for C++ is:

if (argc == 2 && std::string("--edit") == argv[1]) execlp("sh", "sh", "-c", "ME=\"`which $0`\"; $EDITOR \"$ME.cpp\"; g++ -o \"$1\" \"$ME.cpp\" && mv \"$1\" \"$ME\"", argv[0], tempnam(0, 0), 0);

(We have to do some bash magic because C++ program doesn't know its own path.)

taw said...

Divided Mind: I wanted to say that __FILE__ expanding to source file's path is something Perl/Ruby/Python stole all the way from cpp, so C++ would support it too - but it seems to default to not being an absolute path, damn. Probably the sanest (hehe) way would be to hack the build system to pass -D__PATH__=/blah/foo.cpp to the compiler, and then use that.

Divided Mind said...

You'd still need the binary path to swap in, so it'd be only of use if you absolutely have to have source somewhere else.

taw said...

Divided Mind: I prefer to symlink stuff into ~/bin/ over putting billion entries into $PATH, and $0 doesn't resolve symlinks, right? Not that you couldn't call readlink() from the C++ file ;-)

You should perfect this technique, and become famous.

Divided Mind said...

I think it'd be more convenient to use readlink in the script, ie. $ME="`readlink -e "\`which "$0"\`"`", or $ME=\"`readlink -e \"\\`which \"$0\"\\`\"`\" after quoting to C string (I simply love those quoting puzzles). To use it in C++ you'd have to do a round-trip from which or otherwise resolve $PATH yourself.

Robin said...

The Linux equivalent to open is xdg-open.

But there is at least one annoying bug: if you tell Firefox to make itself the default browser, this won't affect xdg-open for some reason (at least on Fedora with KDE).

Anonymous said...

What's the issue with GNU grep?

Divided Mind said...

Another tip: Google Image Search can filter on license, too, and it searches more than just flickr. Also it'd be probably more convenient just adding a search shortcut to your browser instead of using script.

francois.beausoleil said...

I prefer to have alias sudo='echo DUMMY'. Then if I really want sudo, I have to type /usr/bin/sudo, which makes it more of a conscious decision.

Divided Mind said...

Also: GNU df and du actually use 1K blocks. So the df/du trick of yours only seems necessary on BSD.

Divided Mind said...

(Apparently you can set block size with BLOCK_SIZE environment variable, but it defaults to 1KiB. You can also use -k switch to force 1KiB blocks.

BTW, I find the normal output much preferable to -h one. The latter uses differing prefixes, (ie. G, M, K) making it harder to compare sizes visually. Also, with du -h no sort -g for you.)

taw said...

Anonymous: GNU grep doesn't use Perl-style regexps. In particular it thinks that "foo|bar" means "fo(o|b)ar" which makes it essentially useless, and it should die in fire. Every single program in the world made in the last 20 years moved on to Perl-style regexps, for very vague definition of Perl-style.

Divided Mind: I like spawning Firefox tabs from shell. I also have a script for downloading it from flickr, selecting it a nice filename, and suggesting reasonable alt/title for it. Such script wouldn't work with images from arbitrary site. By the way Bing Images > Google Images.

francois.beausoleil: unfortunately far too many things on Unix require root access, which on every other operating systems normal users can do. Like installing software, which I do once five times a day or so. All attempts at making software installable without root access on Unix failed miserably. (notice the examples - port / gem / apt-get)

Divided Mind: Why would anyone want to sort -g list of mounted partitions? ;-) du sometimes needs sorting, in which case you can -k/-m/-g it explicitly.

Stefan said...

gnome-open ...

Rory McCann said...

GNOME has "gnome-open" command, and is similar to OSX's open command

Divided Mind said...

Still I prefer normal du/df output. Consistent unit makes it easy to compare visually -- more digits === more space.

BTW, GNU grep supports perl-like regexps (as well as several other flavours). Just use grep -P.

Anonymous said...

You use -print0 because of newlines (as in \x0a) and not anything else. If you are confident in your file naming patterns you can try this:
find . -name '*.mp3' | while read; do ls -l "$REPLY"; done | sort -k25 -rn | head -n 10

Additionally stat -c%s$'\t'%n might suit your needs better than ls -l. This prints somewhat similarly to du -s, but in real bytes and not disk allocation units. stat has many formatting options.

Anonymous said...

I put up my bashrc (.bash_profile if you like) in full and in bits by topics anonymously on a publicly available wiki. This is considerably easier to keep such small things updated than a blog post, however the downside is that I just keep praying for the traffic of people that might find it useful. ;) Just kidding, I am publishing it mainly to synchronize my useful practices and tools on whatever machine I use. However, if Google lets you, you might stumble upon it one day. ;))) I am not giving a link here. ;)

Two nice websites just to browse to armour your workflow are:

taw said...

Anonymous: You seem to have missed the point where I gave the find -print0/xargs example as a way NOT to do things.

Ruby example is how you should do things.

"If you are confident in your file naming patterns" and obscure options for stat are exactly the kind of problems we totally avoid with Ruby.

Anonymous: If you want traffic, add more kittens. Works for me.

Anonymous said...

% echo 'fooar' | grep 'foo|bar'

% echo 'fobar' | grep 'foo|bar'

% echo 'foo|bar' | grep 'foo|bar'

GNU grep does not think foo|bar means fo(o|b)ar.

taw said...

Quick tests indicate that GNU grep on CentOS and Ubuntu support -P, but Debian's grep returns:

grep: Support for the -P option is not compiled into this --disable-perl-regexp binary

And OSX/BSD's grep has never heard of -P.

grep: The -P option is not supported

It all adds up to supporting my claim that "grep sucks".

shintakezou said...

find . -iname "*.mp3" -printf "%s %h/%f\n" |sort -rn |head -n 10 ... is quite senseful and understandable and even elegant I dare say. Ruby is great anyway, but I think these piped commands are more "natural"

taw said...

shintakezou: Before I even get into all escaping failures in that, let me point out that it dies on OSX:

find: -printf: unknown option

Ruby is not only cleaner and safer, it's also portable.

shintakezou said...

find is part of the GNU findtools, it is portable too, in a different way of course. It is Apple's fault if it is something different on OSX while it could be basically the same. I've tested it of course, and there are no escaping char problems, if it is that you was talking about. The shell should be bash, and it'd be strange if it had different escaping rule than "common" bash on GNU/Linux.
I still like the Ruby solution, but I like find and piping solution too, and it seems not "harder" to me...

heraux said...

Does unix need "undelete"
but wurried who will bash my door
and pillage my /dev/aux/one-eye

myob courses sydney said...

You shared some useful tips for unix programming. As per my view using this tips coding will be easy and simple. I am thanking you that you shared such useful information with us.