The best kittens, technology, and video games blog in the world.

Monday, April 16, 2007

Compiler for RLisp

some cats love sinks by Kevin Steele from flickr (CC-NC)RLisp was a wonderful idea, unfortunately its first implementation was a very slow pure-interpreter, and it was too difficult to retrofit good performance into it.

The first version of RLisp even evaluated macros each time they were encountered, making them in practice "FEXPR"s instead of macros.

I was able to improve RLisp performance a bit. The biggest gain was making sure macros expand just once. Other optimizations resulted in only minor improvement, and the code was quickly turning into a unreadable mess.

The solution was a major rewrite. In the new implementation RLisp code is compiled to something that looks a lot like Lua 5.0's virtual machine code, and then to pure Ruby. Some of Lua virtual machine's ideas like flat closures are simply awesome and greatly simplify the implementation.

The code is much cleaner, and much faster. In "x times slower than Ruby" benchmarks:

Benchmarkaryhelloackermann
Early interpreterx96.1x37.8x1027.0
Optimized interpreterx35.5x58.2x53.1
Compilerx5.3x6.7x7.5

Compiled code

RLisp compiler changed a lot since the teaser.

Now to create functions it uses Object#function_* and Proc.new instead of Function objects. Function objects were more elegant, but they could only be used for free-standing functions, not for methods - inside such function self was a closure.

The new form makes it possible to use RLisp (fn ...) as methods, instance_eval them and so on. Unfortunately as Proc.new{|*args,&blk| ... } is a syntax error in Ruby 1.8, it's impossible to create RLisp functions which take Ruby-compatible block argument. Fortunately you can still call such functions from RLisp code, and RLisp functions can take Proc objects as normal arguments.

A few examples.

(fn (x)
(fn (y) (+ x y))
)
compiles to:
class ::Object; def function_29(globals, x)
Proc.new do |*args|
y, = *args
t0 = x.send(:+, y)
t0
end
end; end
class ::Object; def function_28(globals)
Proc.new do |*args|
x, = *args
t1 = function_29(globals, x)
t1
end
end; end
t2 = function_28(globals)
t2
Proc.new takes |*args| instead of |x| to work around magical behaviour of Ruby blocks called with one Array argument, which is autoexpanded:

Proc.new{|first_arg, *other_args| first_arg}.call("foo") # => "foo"
Proc.new{|first_arg, *other_args| first_arg}.call("foo", "bar") # => "foo"
Proc.new{|first_arg, *other_args| first_arg}.call([1, 2, 3]) # => 1
Proc.new{|first_arg, *other_args| first_arg}.call([1, 2, 3], "bar") # => [1, 2, 3]
Fibonacci function:
(let fib (fn (x)
(if (< x 2)
1
(+ (fib (- x 1)) (fib (- x 2)))
)
))
compiles to:
class ::Object; def function_28(globals)
Proc.new do |*args|
x, = *args
t0 = globals[:<]
t1 = t0.call(x, 2)
if t1
t2 = 1
else
t3 = globals[:fib]
t4 = x.send(:-, 1)
t5 = t3.call(t4)
t6 = globals[:fib]
t7 = x.send(:-, 2)
t8 = t6.call(t7)
t9 = t5.send(:+, t8)
t2 = t9
end
t2
end
end; end
t10 = function_28(globals)
globals[:fib] = t10
t10
If RLisp can determine that a variable is constant or isn't used within nested function, it doesn't create Variable objects for it, and performs various optimizations based on such information. Of course Variable objects are created when needed, like in the following code which creates a counter function. A counter function called with an argument increases the internal counter and returns the old value.

(let make-counter (fn (val)
(fn (incr)
(let old val)
(set! val (+ val incr))
old
)
))
class ::Object; def function_29(globals, val)
Proc.new do |*args|
incr, = *args
t0 = val.get
old = t0
t1 = val.get
t2 = t1.send(:+, incr)
val.set t2
t3 = old
t3
end
end; end
class ::Object; def function_28(globals)
Proc.new do |*args|
a_val, = *args
val = Variable.new
val.set a_val
t4 = function_29(globals, val)
t4
end
end; end
t5 = function_28(globals)
globals[:"make-counter"] = t5
t5

Access to Ruby

RLisp can access Ruby with (ruby-eval "Ruby code") function. All unknown uppercase variables are also assumed to be Ruby constants (nested constants like WEBrick::HTTPServer not supported yet).

This makes using most Ruby libraries from RLisp pretty straightforward.
rlisp> (ruby-eval "require 'complex'")
true
rlisp> (let a [Complex new 1.0 2.0])
1.0+2.0i
rlisp> (let b [Complex new -2.5 3.0])
-2.5+3.0i
rlisp> (let c [a * b])
-8.5-2.0i
Like before, RLisp supports syntactic sugar [receiver method args] for method calls, which expands to (send receiver 'method args). Now it's also possible to specify block argument with &.
rlisp> ['(1 2 3) map & (fn (x) (* x 2))]
(2 4 6)
RLisp can be run in two modes. To execute a standalone program use rlisp.rb program.rl. To run interactive REPL use rlisp.rb -i. It uses readline for input and prints out each expression as it is evaluated. Like every decent REPL, it captures exceptions without exiting.
rlisp> (hello "world")
./rlisp.rb:102:in `initialize': No such global variable: hello
rlisp> (defun hello (obj) (print "Hello, " obj "!"))
#<Proc:0xb7c9e9a8@(eval):2>
rlisp> (hello "world")
Hello, world!
nil
It also supports multiline expressions (without a special prompt yet):
rlisp> (defun hello ()
rlisp> (print "Hello, world!")
rlisp> )
#<Proc:0xb7c241e4@(eval):2>
rlisp> (hello)
Hello, world!
nil
There's also a wrapper script which makes it possible to run RLisp programs with #!, but as it needs to know path to RLisp it's not enabled by default.

Object-orientation and macros

It's possible to create object-oriented code with a bunch of standard macros like.

rlisp> (class String (method welcome () (+ "Hello, " self " !")))
#<Proc:0xb7c4e8a4@(eval):2>
rlisp> ["world" welcome]
Hello, world !
(class ...), (method ...) and so on are just macros, and they expand to:
rlisp> (macroexpand '(class String do_stuff))
(send String (quote instance_eval) & (fn () do_stuff))
rlisp> (macroexpand '(method foo (arg) do_stuff))
(send self (quote define_method) (quote foo) & (fn (arg) do_stuff))
You can explore macroexpansions with (macroexpand 'code) and (macroexpand-1 'code), which are just normal functions. You can look at final generated code with -d command line switch:
$ ./rlisp.rb -d -i
rlisp> (+ 1 2 3)
t0 = 2.send(:+, 3)
t1 = 1.send(:+, t0)
t1
6
rlisp> ((fn (x) (+ x 2)) 5)
class ::Object; def function_30(globals)
Proc.new do |*args|
x, = *args
t2 = x.send(:+, 2)
t2
end
end; end
t3 = function_30(globals)
t4 = t3.call(5)
t4
7
Other options are -n (don't use RLisp standard library), -r (force recompilation of standard library), and -h (display help message).

The future

You can download RLisp from RLisp website. It's licenced under BSD/MIT-like licence, has reasonable performance (CPU-wise 5x slower than Ruby - perfectly reasonable for database-bound or network-bound programs), and a lot of libraries (that is - it can use most Ruby libraries out of the box).

The code is rather clean, and reasonably tested (test code to code ratio 0.96:1). No major program was written in RLisp so far, so a few misdesigns and major bugs are certainly present.

What I'd like to do is create a new backend for RLisp. For now Lua-like VM opcodes are compiled to Ruby, but they're so simple that they could be compiled to C, x86 assembly, or LLVM and still stay fully compatible with Ruby runtime. That would probably make RLisp somewhat faster than Ruby, as Ruby's slowest part is eval. Ruby backend could probably be made faster too without too much work.

It would be awesome to do add at least limited tail-recursion optimization to RLisp.

Have fun.

1 comment:

Brad Ediger said...

Good stuff. I like it.

I hacked together a Rails plugin to compile RLisp files on the fly within a Rails application:

Rails-RLisp