The best kittens, technology, and video games blog in the world.

Tuesday, June 26, 2007

Toy C backend for RLisp compiler

Finette by franziskas garten from flickr (CC-NC-SA)

I wrote a toy RLisp C "backend". It's not yet connected to the real compiler - the only thing it compiles is Ackermann function - of course you can make it compile something else by issuing right opcodes by hand.

First some support code. Symbol#<=> is already in rlisp_support.rb, String#ord is supposed to be a wrapper around different behaviour of Strings in 1.8 and 1.9. Symbol#mangle_c and Symbol#stringify_c convert Ruby symbols (which may contain funny characters like :"==\nblah!@#") to C variable names and strings. For brevity I'm not pasting the tests here.

class Symbol
include Comparable
def <=>(other)
to_s <=> other.to_s
end
protected :<=>
end

class String
# FIXME: 1.8 specific, add support for 1.9
def ord
self[0]
end
end

class Symbol
def mangle_c
to_s.gsub(/([^a-zA-Z0-9])/) { sprintf "_%02x", $1.ord }
end
def stringify_c
'"' + to_s.gsub(/([\\"])/) { "\\#{$1}" } + '"'
end
end


Here's the code which calls the code generator, and then the ack function. It uses Ruby->C trampoline to support closure variables. The interface used by code generator is similar to one used by normal Ruby-backed RLisp code generator, but not identical. I'm going to refactor them both to match, so RLisp frontend can talk with either backend. I think C-compiling at least RLisp stdlib is a good idea. C-compiling REPL less so. Code generators are so simple that I hope to be able to maintain both without too much work.

class Stuff
cg = RLispCodegenC.new(:ack, [:m, :n])
t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12, t13, t14, t15 = cg.tmps(14)

cg.funcall(t2, :m, :"==", 0)

cg.x_if(t2) {
cg.funcall(t4, :n, :+, 1);
cg.asg(t3, t4)
}
cg.x_else {
cg.funcall(t5, :n, :==, 0);

cg.x_if(t5) {
cg.global_get(t7, :ack)
cg.funcall(t8, :m, :-, 1);
cg.funcall(t9, t7, :call, t8, 1);
cg.asg(t6, t9)
}
cg.x_else {
cg.global_get(t10, :ack)
cg.funcall(t11, :m, :-, 1)
cg.global_get(t12, :ack)
cg.funcall(t13, :n, :-, 1)
cg.funcall(t14, t12, :call, :m, t13)
cg.funcall(t15, t10, :call, t11, t14)
cg.asg(t6, t15)
}
cg.asg(t3, t6)
}
cg.ret(t3)

inline do |builder|
result = builder.c cg.build
end
end

ack_m = Stuff.new.method(:ack)
globals = {}
closure = { :globals => globals }
globals[:ack] = lambda{|*args| ack_m.call(closure, args) }

print "retval = ", globals[:ack].call(3, 3), "\n"


RLispCodegenC builds one function at time. To use it you need to create a new RLispCodegenC instance, call some opcode methods, call build on the RLispCodegenC object, and compile it using inline. inline caches .so objects, so C compiler is called only when something changed.

require 'inline'

class RLispCodegenC
def initialize(name, args)
@name = name
@args = args
@syms = {:globals => true}
@ids = {}
@temps = []
@code = ""
end

def build
temps = [:globals] + @args + @temps
src = <<EOF
VALUE #{@name}(VALUE closure, VALUE args) {
VALUE #{temps.join(", ")};
#{static_init}
#{arg_init}
#{@code}
}
EOF
src
end

def arg_init
res = "globals = rb_hash_aref(closure, SYM_globals);\n"
@args.each_with_index{|arg, i|
res << "#{arg.mangle_c} = rb_ary_entry(args, #{i});\n"
}
res
end

# This code should be executed just once in Init_*, not every time the method is called
def static_init
ids = @ids.keys.sort
syms = @syms.keys.sort
res = ""
res << %Q[int #{ids.map{|i| "ID_#{i.mangle_c}"}.join(", ")};\n] unless ids.empty?
res << %Q[VALUE #{syms.map{|s| "SYM_#{s.mangle_c}"}.join(", ")};\n] unless syms.empty?
ids.each{|i| res << "ID_#{i.mangle_c} = rb_intern(#{i.stringify_c});\n" }
syms.each{|s| res << "SYM_#{s.mangle_c} = ID2SYM(rb_intern(#{s.stringify_c}));\n" }
res
end

def tmps(sz)
res = []
sz.times{
t = :"t#{@temps.size}"
res << t
@temps << t
}
res
end

def to_c(x)
if x.is_a? Fixnum
"INT2FIX(#{x})"
# It means a C temporary, ***NOT*** Ruby symbol
# FIXME: Mangling ?
elsif x.is_a? Symbol
x.mangle_c
else
raise "Don't know how to convert #{x.class}: #{x.inspect}"
end
end

def funcall(asg_to, recv, mid, *args)
@ids[mid] = true
mid_s = "ID_#{mid.mangle_c}"
args_m = args.map{|a| ", " + to_c(a)}.join
@code << "#{to_c(asg_to)} = rb_funcall(#{to_c(recv)}, #{mid_s}, #{args.size}#{args_m});\n"
end
def x_if(tmp)
@code << "if(RTEST(#{tmp})) {\n"
yield
end
def x_else
@code << "} else {\n"
yield
@code << "}\n"
end
def asg(to, from)
@code << "#{to_c(to)} = #{to_c(from)};\n"
end
def global_get(to, sym)
@syms[sym] = true
sym_s = "SYM_#{sym.mangle_c}"
@code << "#{to} = rb_hash_aref(globals, #{sym_s});\n"
end
def ret(val)
@code << "return #{to_c(val)};\n"
end
def debug(msg, val=nil)
@code << %Q[rb_funcall(self, rb_intern("print"), 1, rb_str_new2(#{msg.inspect})); /* DEBUG */\n]
@code << %Q[rb_funcall(self, rb_intern("p"), 1, #{val}); /* DEBUG */\n] unless val.nil?
end
end

No comments: