Robin Berjon

Turtles all the way up

A WebIDL Parser for Javascript

WebIDL is a schema language for APIs that is being used (primarily) as part of W3C specifications in order to define various interfaces. If you've read any recent API specification, you've read WebIDL. It is abstract enough that using it one could generate interfaces for a great number of programming languages, but given its origin it is only normal that the vast majority of the time it is used to produce Javascript bindings.

As such it was a shame that there was no way to parse it from Javascript (there were fragments of such a parser inside the ReSpec codebase but you really don't want to stare at them too closely). This deficiency has now been addressed, and you can go grab WebIDLParser.js off GitHub. This is an alpha, it's barely been tested, use at your own risk, patches welcome, etc.

If you wish to use it in a Web context, simply include web/WebIDLParser.js . Note that this file has not been minified and is quite large. In my tests it goes down to 44K minified and 6.4K gzipped (being generated code, it is rather repetitive and compresses well). A WebIDLParser object will become available.

Using it from Node.js or a CommonJS environment is equally simple: just put node/WebIDLParser.js in your library path and require it. It exports a Parser object.

The interface is simple: the only method you need to worry about is parse(str, [start]). It takes a string containing WebIDL, and returns an AST expressed as a simple JS structure (utilities to traverse it more easily are forthcoming). Optionally a second argument can specify which rule of the grammar to start with. This is useful when parsing a small WebIDL fragment that may be incomplete. The names of the rules can be found in lib/grammar.peg. Just so that you don't have to look, the most common start parameters are:

The grammar isn't exactly the one in the WebIDL draft. In some places it is slightly more permissive (not by that much though), and its handling of extended attributes is simpler. It also adds support for WebIDL arrays, even though they are not described in the current draft's grammar.

If you wish modify the grammar, go through the following steps:

  1. read up on PEG.js at http://pegjs.majda.cz/
  2. create a directory at the root of this repository called depends
  3. inside depends, run git clone git://github.com/dmajda/pegjs.git
  4. edit the grammar in lib/grammar.peg
  5. then run node utils/generate.js to regenerate the JS

How It Was Done

I built it atop the PEG.js parser generator. It's a very nice tool, with decent error reporting, and the resulting parser seems to be rather fast. Be warned though that the grammar listed in the "Documentation" section of the site is completely out of data. Instead, read what you can find in its GitHub repository. I found the CSS 2 grammar example to be all that I needed in order to get started. Overall, after stumbling on a couple gotchas (e.g. that any * or + specifier will cause an array to be passed — which is logical when you think of it) I found it to be intuitive and easy to work with.

If you look inside utils/generate.js you will see that I needed to work around some PEG.js limitations by patching the generated code in the most brutal fashion. If I get a minute to myself I will probably patch it so that such barbarian treatment is no longer needed, but in the meantime here were the issues.

The first one is that I want to pre-process the input before parsing it, because I didn't want to have to deal with comments in the grammar — just removing them upfront is easier. A similar feature could be used for a number of other things.

The other limitation is that I want it to be possible in my generated parser to specify which grammar rule to use as the starting point, at runtime (PEG.js only has an option to control that when the parser is being built). This is useful when you might need to parse a subcomponent of the grammar without its full context.

Before looking at PEG.js I gave Jison a shot. Overall it seemed pretty good but its LL(1) implementation doesn't support generating parsers proper. Its error messages were also less clear. That being said, if you need to rely on another family of grammar, it probably is worth a shot.

I couldn't find much of a test suite. If you have good test content, I'll happily take it.

Why It Was Done

Of course, I didn't just do that for the fun of it (though I'll admit that playing with the various Javascript parser generators had been on my mind for a while). The endgame here includes several targets.

First, provide ReSpec v2 with decent WebIDL processing. The current code that's used in ReSpec v1 has grown over time from just parsing a few small things to an unwieldy mess that I'm scared to touch lest I break existing content — it's time to change that. Alongside the parser I intend to add a visitor of sorts which will hopefully make it easier to generate documentation from WebIDL.

Second, using WebIDL in specifications should make it easy to automatically generate a large number of tests that can see if implementations at least support the interface properly (in terms of form that is, this can't generate behaviour tests obviously). We currently have a tool that does that but we're not entirely happy with it. Hopefully this can help improve that.

Finally, I want to write a WebIDL to JSON Schema converter. Why? Because JSON Schema is good at describing REST+JSON services. Having such a conversion would make it possible to using WebIDL to describe REST interaction. That means that many of the W3C APIs being worked on now could be exposed over the network. For some of course it might not make sense, but for others (e.g. Contacts, Calendar, File System) it could turn out to be quite interesting. I'll keep you posted.