taw's blog: W3C's XML Query Use Cases ported to magic/xml

Saturday, August 05, 2006

W3C's XML Query Use Cases ported to magic/xml

Photo from flickr, by annia316, CC-BY.

W3C's XQuery Use Cases list typical usage scenarios for XQuery. As the plan behind magic/xml is to beat every other XML processing system in terms of expressiveness, I started recoding these use cases in magic/xml. So far the first three use cases (23 queries) got recoded. A few examples follow. Complete sources are packed together with magic/xml library. Query XMP 1: List books published by Addison-Wesley after 1991, including their year and title. In XQuery:

<bib>
{
 for $b in doc("http://bstore1.example.com/bib.xml")/bib/book
 where $b/publisher = "Addison-Wesley" and $b/@year > 1991
 return
   <book year="{ $b/@year }">
    { $b/title }
   </book>
}
</bib>

In magic/xml:

XML.bib! {
   XML.from_file('bib.xml').children(:book) {|b|
       if b.child(:publisher).text == "Addison-Wesley" and b[:year].to_i > 1991
           book!({:year => b[:year]}, b.children(:title))
       end
   }
}

Query TREE 1: Prepare a (nested) table of contents for Book1, listing all the sections and their titles. Preserve the original attributes of each

element, if any. In XQuery:

declare function local:toc($book-or-section as element()) as element()*
{
   for $section in $book-or-section/section
   return
     <section>
        { $section/@* , $section/title , local:toc($section) }
     </section>
};

<toc>
  {
    for $s in doc("book.xml")/book return local:toc($s)
  }
</toc>

In magic/xml:

def local_toc(node)
   node.children(:section).map{|c|
       XML.section(c.attrs, c.child(:title), local_toc(c))
   }
end

XML.toc! {
   add! local_toc(XML.from_file('book.xml'))
}

Query SEQ 1: In the Procedure section of Report1, what Instruments were used in the second Incision?

for $s in doc("report1.xml")//section[section.title = "Procedure"]
return ($s//incision)[2]/instrument

magic/xml:

XML.from_file('report1.xml').descendants(:section) {|s|
   next unless s.child(:"section.title").text == "Procedure"
   print s.descendants(:incision)[1].child(:instrument)
}

magic/xml solutions are on average 10% longer than XQuery. That's a great result, because:

I don't think any other XML processing library gets results even close to that. So far only special-purpose XML processing languages like XQuery or CDuce were able to be that expressive.
10% more characters is very little in exchange for full power of Ruby. And you don't have to learn a new language, working with magic/xml is almost like working with plain Ruby.
The benchmarks come from XQuery guys and show what is good at, so in a more unbiased selection magic/xml would probably do better than that. For example magic/xml assumes real XML attributes (<a href="http://www.google.com/">Google</a>) are the common case, not dummy tags with text (<a><href> http://www.google.com/<content>Google</content></a>) , while XQuery assumes it the other way around. Then maybe magic/xml should get a few convenience methods for the other case.
If special XML processing languages aren't significantly more expressive, do we even need them ? Performance comes from good profilers, but expressiveness cannot be hacked onto the system afterwards. So even if magic/xml is slow sometimes, it doesn't really matter.

Anyway, enjoy :-) Official beta release is coming soon.