The best kittens, technology, and video games blog in the world.

Tuesday, May 16, 2006

OCaml programming best practice





(The panda doesn't feel well, because of programming too much OCaml. But trust me on this, you do not want to see a panda that had to code C++)

OCaml is a very big and complex language, like C++, and many parts ofit kinda stick out, are experiments that didn't go well, or exist only for backward compatibility.

If you want to code OCaml, you might find these hints useful.


  • "Exceptions" in OCaml won't give you a backtrace. This is easily one of the top 3 worst things about OCaml. So even in situations when you'd use an exception in Ruby/Python/Java/etc., think whether you can do something else in OCaml.

    • For handling "can't happen" situations use something like failwith (sprintf "Internal error: function foo(%d, %s, %s) expects a non-empty list" a b (foo_to_string c)). Because you won't get a backtrace, you should include as much information as possible.



  • OCaml's standard library is one of its weakest parts. Always use extensions to the std library. If you are concerned about people introducing unnecessary dependencies, simply copy sources of it to your repository with an appropriate copyright notice.

  • As we know from Perl/Ruby/Python/every other modern language, the three most common data structures you ever use are strings, hash tables and resizable arrays. You can get OCaml support all of them reasonably well, but it doesn't by default.

    • sprintf is one of the most useful OCaml functions. And do open Printf in all your OCaml files, it will save you a lot of typing and functions from Printf module have names that don't collide with anything. If you need to print something for debugging, define converters of all your types, like foo_to_string foo = sprintf "%s %d %s" foo.a foo.b foo.c

    • DynArray (resizable arrays) is one of the most useful OCaml data types ever. No more idiocy like building list ref and reversing it.

    • OCaml Hashtbl is weird and confusing. It is used for 1->1 hashes and 1->N hashes (and ('a,unit) hash tables are used as sets) and it's easy to get something wrong (was Hashtbl.add meant to add another element to 1->N hash or was it simply setting a value that didn't exist before ...). Besides, calling Hashtbl.blah is extremely verbose and it's impossible to include them in your namespace due to collisions. So it's very good idea to make types ('a,'b) ht = HT of ('a,'b) Hashtbl.t for 1->1 hash tables, ('a,'b) mht = MHT of ('a,'b) Hashtbl.t for 1->N hash tables and 'a set = SET of ('a,unit) Hashtbl.t. Then define functions like like ht_set, ht_get, mht_add, mht_get_all, ht_iter, mht_iter_all, set_add, set_mem etc. in a module that you include from all your files (I usually call it util.ml). And definitely define functions like ht_keys, ht_values, ht_iter_keys etc. I have no idea how could they have forgetten them in the std library.


  • Do not use tuples bigger than 2 (in special cases 3) elements, or long-living tuples. Use records instead. Records can be easily extended, can be made mulable, and you won't have to remember which field of the tuple meant what. This also applies to Python ;-)

  • Encapsulate all common folds and tail recursions. It's very easy to make a mistake with them and the code looks ugly. On the other hand high-level functions like map, iter, filter, collect, join etc. don't clutter your code. Never use the same recursion/folding pattern twice. Extract the pattern. I think it's a good idea to separate the pattern from the application code for single-use patterns too. Some examples: collect : ('a'->'b option) -> 'a list -> 'b list, mht_iter_all : ('a' -> 'b list -> unit) -> ('a,'b) mht -> unit.

  • ocamldep is pretty helpful for writing Makefiles. Simply do ocamldep * >>.deps from time to time and include .deps from your Makefile.

11 comments:

Anonymous said...

The following command-line options are recognized by ocamlrun.

-b
When the program aborts due to an uncaught exception, print a detailed “back trace” of the execution, showing where the exception was raised and which function calls were outstanding at this point. The back trace is printed only if the bytecode executable contains debugging information, i.e. was compiled and linked with the -g option to ocamlc set. This is equivalent to setting the b flag in the OCAMLRUNPARAM environment variable (see below).

taw said...

I almost feel bad that I missed that one. :-)

Ok, I was wrong, there is some way to get Ocaml print backtraces. I must have missed it because I always used native-compiled code, in which case it doesn't work at all.

Still, it's pretty limited. It only prints file names and line numbers, and not function names, arguments etc.

Another update of the list: OCamlMakefile is usually better than hand-coded Makefiles calling ocamldep. It is a very fragile piece of Makefile hackery (and can easily screw up dependencies with multiple projects), so it shouldn't share a Makefile with anything else, but when left undisturbed it works quite well.

Anonymous said...

What makes you think OCamlMakefile is fragile? Also, multiple projects work quite well and they don't have to involve recursive makefiles as you're suggesting. And yes, you can share makefile with something else - best way is to include OCamlMakefile in your specialized makefile.

No offense, but why are you using the language you don't like? It's counterproductive, your putting your energy in proving that X sucks. Just code in what you like and have fun.

taw said...

There are two big reasons why OCamlMakefile is fragile. One is the general problem with recursive Makefiles, which are best described here. If you build a single binary, the problem doesn't apply.

The second big issue is that OCamlMakefile does some really evil Makefile hackery. As soon as you want your Makefile to do something non-trivial besides compiling OCaml programs, it is pretty likely these things will interfere.

A simple example of a common Make thing that doesn't work with OCamlMakefile:

# First, build a test binary
define PROJ_test6
SOURCES=$(TEST_COMMON) test6.ml
RESULT=test6
endef

# And now run the test
# To run a test first build it of course
run_test_6: test6
./test6
diff --brief img_6.png correct_6.png

But with OCamlMakefile test6 is not a valid dependency.
To build test6 you cannot use:

# make doesn't even return error,
# it just tries to compile test6.ml
# to test6 ignoring everything in
# the Makefile
make test6

only:

make native-code SUBPROJS=test6

What doesn't make a valid dependency. It can be worked around, but it's yucky.

The other question is rather pointless. As Saddam Hussein of programming said - "There are only two kinds of languages: the kind everybody bitches about, and the kind nobody uses".

You should be more worried when people do not put any energy in proving your favourite language sucks. That would mean it's pretty much dead.

Anonymous said...

Exception backtraces are
already available in bytecode compiled ocaml
.

Native code exceptions have been added to native compiled ocaml for version 3.10 of the compiler which will be available RealSoonNow(tm).

Anonymous said...

Xavier just released a beta of 3.10.0 that might interest you. Some neat improvements are 'ocamlbuild' (http://gallium.inria.fr/~pouillar/ocamlbuild/ocamlbuild-presentation.pdf)
as well as stack backtraces for native code and bytecode.

taw said...

Anonymous: Thanks for information. It's always great to see things that I didn't like getting fixed. :-)

ocamlbuild looks very interesting too.

Anonymous said...

I would also recommend you to put a bit of attention to the ocamlbuild app.

Anonymous said...

None of these are applicable to getting performance out of OCaml, infact your use of functors and supporting of box float data structures will cause your scientific ocaml code to crawl.

taw said...

Anonymous: If you needed 100% performance you'd be writing in Tesla assembly or Fortran, not OCaml.

OCaml gives you a good compromise between performance and expressive power, and you can always go slightly lower level with the most performance-critical part of your app to get a few % more power if you need it.

Timothy said...

It should be noted that functions called from Tk (labltk), eg. callbacks in response to user input, *do not* cause a backtrace to be printed - they must be run from the top-level to properly print a backtrace. I wish it were otherwise - this makes debugging a GUI program difficult.
Anybody know if it is the same for gtk?