The best kittens, technology, and video games blog in the world.

Wednesday, May 03, 2006

Variant types in OCaml suck

So, here's the first of the promised long, boring, technical rants.

My MSc thesis is a compiler for shaders written in the RenderMan Shading Language for SaarCOR hardware. Writing a compiler is a pretty straightforward task - gather some tests, write a hardware emulator, then a parser, a middle end, a code generator, then play with the components until they produce something satisfactory. Not much of a blogable material. The first issue however, choice of the programming language, was a bit interesting. The options I considered were:

  • Java + possibly some high-level JVM-targeted scripting language
  • OCaml, used very conservatively
  • OCaml, used with all the fancy stuff like variant types
  • Other functional language, like Common Lisp or Scheme
  • Some real high-level language like Ruby, Perl or Python
OCaml's (and SML's) static typing is usually way too annoying. The reason I decided to try OCaml anyway were all the fancy things that it has and the older MLs didn't, the stuff which supposedly makes coding much easier, like objects, variant types, camlp4 and so on. That was kinda the last chance I was giving it, not quite satisfied with OCaml, but not yet willing to abandon it.

And the way the fancy features are implemented really sucks.
Today, let's talk about suckiness of the variant types.

Imagine there are two types, flt_temp = `FLT_TEMP of int (floating point temporaries), and vec_temp = `VEC_TEMP of int (vector temporaries). There is also a type any_temp = [flt_temp|vec_temp] for temporaries of either kind. So in contexts where only float temporaries are allowed (like division), flt_temp is used, in contexts where only vector temporaries are allowed (like dot products) - vec_temp, and in contexts where any temporary is ok (like function argument) - any_temp. So far so good, and a dumb wrappers like F of flt_temp|V of vec_temp wasn't needed to implement any_temp.

let any_temp_to_string : [<any_temp] -> string
= function
|`FLT_TEMP(x) -> sprintf "t%d:float" x
|`VEC_TEMP(x) -> sprintf "t%d:vector" x

Notice the <. It means that any subtype of any_temp can be argument of this function. So it's possible to run any_temp_to_string on any_temp, flt_temp or vec_temp values, what's a really nice improvement compared to the old MLs. However at this point you may be a little bit confused - why is < needed ? Isn't a value of type flt_temp/vec_temp also any_temp ?

Now that's where we get to the suckiness part, because it's not and it needs an explicit typecast. Unfortunately the typecast often needs to be repeated many times, because OCaml will report a type error without even looking at our annotations:

For example, one might have thought that the following would work:
let extract_used_temps : any_comp -> any_temp list = function
|`DIVISION(a, b) -> [a; b]
|`DOT_PRODUCT(a, b) -> [a; b]

But it won't. The "correct" rewrite is (extremely ugly and verbose):
let extract_used_temps : any_comp -> any_temp list = function
|`DIVISION(a, b) -> [(a :> any_temp); (b :> any_temp)]
|`DOT_PRODUCT(a, b) -> [(a :> any_temp); (b :> any_temp)]

A related issue is that polymorphic functions of type 'a->whatever
get instantiated to any_temp->whatever, not [<any_temp]->whatever.
So Hashtbl.find ht_of_any_temps some_flt_temp is going to fail unless we use (some_flt_temp :> any_temp).

The "let's screw the type system" function Obj.magic (identity function of type 'a->'b) every here and there limits the typecast bloat a bit, but I've found that it helps in maybe 10% of cases, and of course then, if you make a typo, you end up with random segfaults (not even a nice runtime exception like in a dynamically typed language). In the 90% of cases there isn't really a way to use Obj.magic, mostly because it's applied too late, only after OCaml finds what it thinks is a type error.

Now some statistics - the compiler itself at the moment has 4239 lines, and that includes 175 uses of :> and 32 uses of Obj.magic. It's so ugly :-/

So in the end the type system is mostly something to fight with, not something that helps, just like it was so often the case in plain OCaml/SML. So for the next project, I'll try something with fewer types. And I'll probably give up on OCaml.

Oh, and I also had some problems with OCaml's object system, maybe I'll blog about it sometime later. ^_^

12 comments:

Panda said...

>For example, one might have >thought that the following would >work:
>let extract_used_temps : >any_comp -> any_temp list = >function
>|`DIVISION(a, b) -> [a; b]
>|`DOT_PRODUCT(a, b) -> [a; b]

I don't know depths of polymorphic variants but it seems that this works with Ocaml 3.08.3.
Here is what i've entered:

type f_t = [`F of int];;
type v_t = [`V of int];;
type a_t = [f_t|v_t];;

type a_c = [`DIV of (a_t*a_t)| `DP of (a_t*a_t)];;

let ex : a_c -> a_t list = function
| `DIV (a,b) -> [a;b]
| `DP (a,b) -> [a;b]
;;

ex (`DIV (`F 1,`V 1));;

So it seems that Ocaml is ok with this part, or maybe I don't understand what you mean.

Regards and ...pozdrowienia z Polski ;)

taw said...

Panda: a_c type is missing the point. Different operations accept different arguments

For example dot product only accepts vectors (v_t*v_t), and division only accepts scalars (f_t*f_t).

type a_c = [ `DIV of f_t * f_t | `DP of v_t * v_t ]

let # let ex : a_c -> a_t list = function
| `DIV (a,b) -> [a;b]
| `DP (a,b) -> [a;b]
;;

This expression has type f_t but is here used with type a_t
The first variant type does not allow tag(s) `V

Anonymous said...

A few thoughts on this:

1. Object.magic is a terrible idea. Don't use it until you really really understand the type-system super well. I've been using OCaml for a decade and I avoid Object.magic like the plague.

2. Avoid polymorphic variants at first. You can get really far in OCaml without using subtyping features like polymorphic variants. People coming from other languages to OCaml tend to want to overuse features like the object system and polymorphic variants. It's worth spending a little while doing things the "core ML" way first. (In the long run, polymorphic variants are a great feature that one should use all over the place, and the object system is not terribly useful except in special cases.)

3. If you do use subtyping, you're right that you need lots of explicit upcasts, which is annoying. You can make these somewhat less annoying by defining little helper functions or operators to do the upcasts for you without a lot of syntactic mess.

Anonymous said...

the compiler itself at the moment (...) includes 175 uses of :>

Would you please show them? (there are a couple in ocamldoc and some symbols ":>" in camlp4 because it must parse the grammar!)

Anonymous said...

"So far so good, and a dumb wrappers like F of flt_temp|V of vec_temp wasn't needed to implement any_temp."
This wrapper is not at all dumb, it is the correct way of doing it. It gives you type safety in exchange of very few extra characters. It also facilitates easy casing, which you need, as users of any_type *do* care what type they get, as in any_type_to_string.

arkadir said...

From my experience, only very few explicit casts are ever required in a given program. In your example, a shorter way to achieve this would be simply to use a more abstract representation of your type:

type a_c = [ `DIV of f_t * f_t | `DP of v_t * v_t ]

type b_c = [ `DIV of a_t * a_t | `DP of a_t * a_t ]


Given that f_t and v_t are both subtypes of a_t, it follows that a_c is a subtype of b_c and so :

let ex : a_c -> a_t list = fun a ->
match (a :> b_c) with
| `DIV (a,b) -> |a;b]
| `DP (a,b) -> [a;b]


This will generate no compiler error, and a single coercion is involved. Of course, you still have to define the b_c type, but doing so is extremely easy:

- In a_c -> a_t list type annotation from ex, replace a_c with 'a

- Run the compiler.

- Read the type of ex as determined by the compiler: it's a function that expects an argument of the type that happens to be exactly b_c

So, you can use the compiler to generate or re-generate the source code for b_c every time, while also keeping your ex function readable.

taw said...

arkadir: Your b_c type is simply 100% invalid. If I use types like that, I can as well use Ruby or some other dynamically typed language.

Simmera Jottun said...
This comment has been removed by the author.
arkadir said...

taw : what's your point?

Your 'ex' function intentionally discards type information (that's what those type coercions do : discard type information). My proposed 'b_c' type discards exactly the same amount of information. If using 'b_c' feels dirty, you should feel just as dirty writing 'ex' in the first place.

Obviously, you shouldn't be using 'b_c' everywhere in your code, 'a_c' is fine for that and I even annotated the argument type of 'ex' as 'a_c' in my comment to remind you of that. But coercing from 'a_c' to 'b_c' in the 'ex' function saves you time without making type-safety any worse than it already is. So why not do it?

-ark

taw said...

arkadir: flt_temp/vec_temp/any_temp/any_comp are all semantically meaningful types. Type safety of code which uses these types is perfect. The only type casting which happens here is the safe kind to wider valid type.

But b_c is pure nonsense. b_c values like `DIV of f_t * v_t are always wrong and that's it. Existence of such type anywhere in the program makes baby Turing sad.

bob said...

arkadir is correct. you're totally wrong

Anonymous said...

I really loved the post so I used my Digg account to digg it.. It's hard to find knowledgeable individuals on this matter, but you sound like you already know what you're talking about! Thanks A rise in Amazing A lot more.