Photo from Commons by Over Fresh, public domain.
It's high time someone said that. There is a nice and elegant subset of XML that everybody uses. It looks more or less like this:
<foo color="blue"><p>Some text & and a bit more</p><br/>
So we have tags, attributes, text and the standard &-entities for escaping (> < & " '). All in UTF-8. And maybe also HTML-compatible entities and comments, but they are already a bit annoying.
Now the XML standard is way fatter and uglier than that. It contains:
- DTDs. DTDs do not follow the XML syntax, and according to the standard, they can be dumped straight into any XML document and the program is supposed to handle all that. And they do more than just validation ! They can set attribute default values, define text replacements and do a lot of other useless things. This is the worst thing about XML. Of course nobody actually dumps such things into documents, the most people do is a single (and ugly anyway) DOCTYPE declaration, as if we couldn't use MIME types for that.
- Non-standard entities. What does &foo; mean ? Well, it can mean anything. And it really sucks, because the program doesn't want to deal with escaping issues. So the program wants "AT&T", not "AT&T". And what is parser supposed to return when it gets "&foo; & &bar;" ?
- CDATA - Yeah, let's provide a second and completely redundant way of escaping characters to make everyone's life harder.
- XML declarations. These <?xml ... ?> things that can specify version and encoding. As if the standard couldn't simply say "XML documents are encoded in UTF-8".
- Processing instructions. So now every program is supposed to somehow deal with <?mspaint ... ?> randomly splattered through the document. They don't even have to follow the tree structure, so where the heck is the parser supposed to attach them in the parse tree ?
node = XML.foo { bar!("Hello"); bar!({:color => "blue"}, "world") }
to get a node equivalent to <foo><bar>Hello</bar><bar color="blue">world</bar></foo>
. Enjoy :-)
No comments:
Post a Comment