2008-07-30

BZW Spirit Parser -- Proof of Concept

I've been hacking away at spirit and some sort of BZW-parsed storage structure, and this is what I have so far:
  Example test BZW file:
    test.bzw
  Preliminary Parser source file:
    parser.cpp
    Things left out
  • Most BZW objects and parameters
  • Blocks within Blocks
  • Any sort of sane group management
  • Some method for pulling data out of the Parser object
  • Meaningful errors
  • etc.
I'm aware of the ugliness of having all that junk in one source file but this was merely a test which took way longer than I would have liked. However I learned much more about Boost, Spirit and C++ in general (templates, functors). Coming from a mostly C background, the magic of templates and functors is quite new and interesting to me. C++'s use of the & for references threw me for a loop when I was expecting a C-like result. All is well now, it would seem. I do wish the people at Boost/Spirit would clean up the error messages (hint: I forgot a const at the end of what is now line 144 of parser.cpp). I guess there's not a lot they can do, though. Silly templates.

2008-07-11

Spirit

Please allow me to contradict myself. In my last post, I spoke of writing a parser from scratch. Things have inevitably changed. Now the plan is to use the Boost::Spirit parsing library to flexibly parse BZW files. It's a good idea to use Spirit because it's as flexible as one can be with a Backus-Naur Form based engine, and the majority of the work is already done for me. In addition adding Spirit to BZFlag isn't all too deadly, compared to other dependencies: Spirit is composed entirely of header files, utilizing all forms of templates I probably would prefer not to know about. Downside is it's a lot of header files. Fortunately the valuable bare bones can be extracted from Boost via a supplied tool, but enough about that.

Unfortunately for the Spirit developers, they are restricted to using C++ operators. This brings about ambiguous situations such as
*real_p
which matches (rather, parses and matches) any number of real numbers. The issue is obvious. So throwing chunks of Spirit code amongst C++ code can be very messy indeed, not to mention the ugliness of C++ templates.

It's a learning experience with Spirit; transposing BNF notation from paper to Spirit's odd looking syntax. All of this helped resurface some old questions: If objects are to handle their own parsing, how should this be done? Should they all have a Spirit parser within, or go through some abstracted Parser by informing it which information the object is interested to, and maybe providing a callback for the Parser to use if it should find that interesting information. The latter seems like a good idea, and it could take advantage of Spirit's dynamic parser composition features. Plug-ins that define new Objects could simply document what sort of information they expect and the Parser could handle it like every other object. No need for Spirit code to spaghetti into other people's source. This seems like a good idea. All we need is a BNF document for BZW format and we're off to the races.

The only difficulty in writing BNF for the BZW format is that too many generalizations breaks the format, and none make the parser very specific. So, the goal is to figure out which rules can be bent. For example, one such rule is the name field:
world
  name Simple World
  size 100.0
end
How should it be treated? How many "values" are there? If I treat it like a generic property, a good example of which is position or size, properties that have anywhere from 1 to many values associated with them, then it is left to the object to deal with all the values. For example, if the callback for an object a name like the above was provided, then the object would have to push all the values off a stack or queue onto a string to get the name back. There are alternative options for passing back the data that could solve this problem, but it's easier to solve it upfront in the parser. Should the object perhaps inform the parser exactly how many values it expects? Things to be decided. More to come soon.

Edit: I realize I may have been vague in the preceding paragraph. Allow me to clarify: By "values" I refer to space-separated character-groups, like "Simple" and "Word" and 100.0. The problem is differentiating between [<identifier> <value> <value>] and [<identifier> <magical value that continues until the end of line>]. Simple solution: Two different types of parameters.