The New FreeOrion Parser

Here, various developers can tell you all about what they're up to, so you can yell at them for being idiots. "... and there was a great rejoicing."
Programming Lead Emeritus
Posts: 1092
Joined: Thu Jun 26, 2003 1:33 pm

The New FreeOrion Parser

#1 Post by tzlaine »

What Happened

First off, for those of you that don't know a "parser" is a bit of code that reads plain text (AKA a "string"), and does something with it. FreeOrion has a parser that reads the FO config files (e.g. techs.txt) and builds the appropriate data structures to represent the contents of each file.

I've just rewritten it from scratch, using the old one as a guide. This was a fairly big job, but I feel that the benefits outweigh the development time.

The benefits are:
  • It's easier for the programmers to extend and modify the parser. The old design had some warts that are now removed.
  • It's easier for the scripters to write the FO config files, because there are really good error messages now.
That might not sound like much, if you have never tried to add something to the parser, or change one of the config files. ;)

Now that we have several people writing content and asking for new kinds of content to be available for their use (i.e. asking for changes to the parser), this seemed like a good time to make this change. As the config files get larger, being able to figure out where errors are will become more and more important.

Here is an example of an actual error message produced by the new parser:

Code: Select all

/home/tzlaine/FO_parser/default/techs.txt:1990:46: Parse error.  Expected Condition here:
                  OwnedBy TheEmpire Source.Owner
The first line starts with file:line:column: for easy access to the exact location of the error. Then, the thing it expected to find (above, it expected a Condition). Next, the parser quotes the line in which the error occurred. Finally, the caret ("^") on the last line indicates the position of the problem in the quoted line.

The new parser follows the old one nearly exactly. However, there is one significant change in what you can put into the config files; non-numeric values can no longer include arithmetic or numbers. For instance, "System.StarType + 1" is no longer allowed. See this thread for an example of why this was a bad idea in the first place. Take special not of this post that underscores how crazy the config files would have to get if the scripter were responsible for keeping track of all the math.

There are a couple of quirks to the new parser. First, you must have at least one space before a C-style/multiline comment ("/* some comment */"). "Some stuff/* some comment*/" will produce an error, but "Some stuff /* some comment*/" will not. Second, error locations are a bit wonky. The error location where the parser thinks an error occurred is wherever the parser last successfully made progress in the parse. So, in the example error message above, there is a typo in the name of the Condition following the quoted line. It should read "ProductionCenter", but I mistyped it as "PoductionCenter". It would be better if the parser pointed to "PoductionCenter" itself. I'm currently working to make the error messages even better in this regard. The tl;dr on error messages is that you should start looking at the indicated location to find the error, since the error might not be exactly where the caret is pointing.

Future Work

I'd like to make the parser a bit more user-friendly. For instance, I find the parameters passed to GenerateSitRepMessage to be a bit clunky. It would be better IMO if they were passed as Tag = Data instead of Tag = "Tag" Data = "Data". I've kept the old way of doing things for the first pass, so we can just evaluate the reimplemented parser for correctness before making any significant changes. I'm soliciting comments now from scripters who find particular parts of the config file grammar to be hard to use or unclear. So, comments welcome.

Also, I tried my best to make the error message output as user-friendly as possible, but as a programmer I find it hard not to use terms like int, double, and string. It's not clear to me how clear the error messages will be to non-programmers, so again, comments welcome!