2008-06-26

Parsing (the old fasioned way)

The trend of my development so far has really been parsing. Normally for a parsing problem this complex I would turn to lex/yacc, but additional dependencies are not required, and lex/yacc can be cross-platform problematic. The two parser implementations currently in bzfs and bzwb tackle the file very differently. The bzfs implementation does a very hard-coded, procedural run through the file, bloated with ifs and elses, and way too many strcasecmp's for my liking. On the other hand, bzwb parses in a fairly organized manner, breaking the parsing down into a fairly flexible set of functions reliant on a list of supported input, which could be modified fairly easily. This is more along my line of thinking. The only problem I have with it is that it involves reading and storing the file in memory for a while until parsing completes. So, with the spirit of a new library, why not write a new parser?

It's been a bit of a mind-bend so far trying to accommodate some strange features the BZW format currently has, and figuring out a set of rules that abstractly cover all possibilities isn't easy. I originally thought it would be best to supply to the parser the information it needs to know about what sorts of objects it can read, what sorts of parameters those objects contain, etc,. before parsing the actual file. Then I thought it might be better to just read everything in, no matter how valid it might be, and proceed to pulling out the required information that was read, complaining about bits that were unnecessary and thus, most likely, invalid. While this approach has the benefit of being very simple and not very difficult to implement, it's not very clean. It leaves a big mess of potential warnings to fire off when it's done, instead of while it's reading, and almost welcomes very messy BZW files, which is a bad idea. After much internal debate, I decided once again to go with my first plan, albeit, slightly backwards than I had originally plan:
  1. Create and record all the required object information in Parser Object types.
  2. For each Parser Object, create the required parameters.
  3. Feed these objects into a new Parser instance.
  4. Run the Parser, which will use the provided rules
  5. Pull out all the required data for each object and do some magic.
I guess we'll see if it works out, shortly!

No comments: