parsing experiment notes

Programmers discuss here anything related to FreeOrion programming. Primarily for the developers to discuss.

Moderator: Committer

Message
Author
User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

parsing experiment notes

#1 Post by Dilvish »

So, I've been trying to make some more advances with the parser, partly driven by the current specific interest in being able to do use a statistic based on JumpsBetween, and partly just because it seems the issues involved should get resolved better no matter what.

Geoff, some of this, particularly this beginning stuff, I believe matches some experiments you've already tried, but you may have tried it a little different or I may have concluded something different so please bear with me, I think I've figured out some useful info, though not enough to solve all the problems. Some of this relates to casting, and some of it more particularly relates to our use of expectation parsers.

As Geoff had noted in a different thread,
Geoff the Medio wrote:I was playing around with parsing of int and double constants, disabling parsing of one and casting to the other. When doing this, I got an error message from the parser, like so:

Code: Select all

2015-01-26 21:55:30,731 ERROR Server : C:\Users\Geoff\Desktop\FreeOrion_VS2010_SDK\FreeOrion\default\buildings.txt:1577:193: Parse error.  Expected integer function expression here:

        EffectsGroup
            scope = Source
            activation = Or [
                Not ContainedBy Contains And [ Monster Unowned ]
                Turn high =  ( ( 200 + (50 * (5 - GalaxyMaxAIAggression)) + (1000 * (If condition = ValueTest high = 2 testvalue = GalaxyMaxAIAggression))) * ((Max(1, 3 - GalaxyPlanetDensity))^0.5) * ((Max(1, GalaxySize / 200))^0.4))      // always remove lanes before spawn start turn
                                                                                                                                                                                                 ^
            ]
            effects = RemoveStarlanes endpoint = WithinStarlaneJumps jumps = 1 condition = Source

        EffectsGroup
            scope = Source
This makes me wonder if the 0.5 is actually being parsed and used as 0.5, or if it's being cast to int, and thus being truncated to 0. Someone might want to double-check that condition is working as intended...
Following up on that, we had concluded that yes, the 0.5 was being cast to an int and truncated to zero, and Geoff noted,
Geoff the Medio wrote:Problem is that it's expecting a ValueRef<int>, and that probably forces all the subsequent parsing to treat whatever it gets as a ValueRef<int>, which is allowed to be written as a double for constants... so writing "1.5" will be static_cast to int, then stored as a Constant<int> with internal value 1.
Looking into that a little bit more, I'll confirm that yes, the Turn condition parser expects a ValueRef<int> and so uses parse::value_ref_parser<int>(), whose parser for constants is

Code: Select all

            constant
                =   tok.double_ [ _val = new_<ValueRef::Constant<int> >(static_cast_<int>(_1)) ]
                |   tok.int_    [ _val = new_<ValueRef::Constant<int> >(_1) ]
                ;
and so it will automatically convert all constant doubles to ints.

I think it would be a better design if the basic int parser did not do any automatic conversion from double to int, especially not conversion of a plain constant as was happening here, since many people would reasonably expect the use of a <double> constant to force a <double> calculation, and we have other ways to deal with situations where an int is truly required. Instead, I think the basic int constant should only deal with true ints, and we should selectively enable conversion from double to int only for very specific cases. So, the int constant parser would simply be

Code: Select all

            constant
                =   tok.int_    [ _val = new_<ValueRef::Constant<int> >(_1) ]
                ;
To help deal with situations where an int is needed, we could add scriptable RoundUp, RoundDown, and Truncate operations as Geoff has mentioned, but also, for some situations I think we can still reasonably just deal with it at the parser level. I propose we add something like

Code: Select all

        parse::value_ref_parser_rule<int>::type                int_castable_expr;
             castable_expr
                 = value_ref_parser<double>() [ _val = new_<ValueRef::StaticCast<double, int> >(_1) ]
                 ;
(the precise form would depend on where we placed it).

And then a good place to use it would be for parsing conditions like Turn, where we need to provide the actual condition a ValueRef<int> but it would be reasonable for the user to provide us a scripted <double>. So the parser for Turn, instead of being

Code: Select all

            turn
                =  (tok.Turn_
                    >> -(parse::label(Low_token)  >> int_value_ref [ _a = _1 ])
                    >> -(parse::label(High_token) >> int_value_ref [ _b = _1 ]))
                    [ _val = new_<Condition::Turn>(_a, _b) ]
                ;
could (I think) instead be something like

Code: Select all

            turn
                =  (tok.Turn_
                    >> -(parse::label(Low_token)  >> (int_value_ref  [ _a = _1 ] | castable_as_int_value_ref [ _a = _1 ]))
                    >> -(parse::label(High_token) >> (int_value_ref  [ _b = _1 ] | castable_as_int_value_ref [ _b = _1 ])))
                    [ _val = new_<Condition::Turn>(_a, _b) ]
                ;
To let this work I am pretty sure there would still be some other changes necessary in the int_ref_parser, greatly reducing the use of expectation parsing such as with the definition of the expression parsers. They might make the parser more efficient but I think they also make them too rigid in most of the places they are used. Right now if we only made the changes I mentioned above, when the Turn parser tried running it would first try matching its argument as an int_value_ref, and so when it first hits the first '(' in the experimentor outpost example, the int primary parser will be triggered with expectation parsing-- it begins with

Code: Select all

primary_expr
                =   '(' > expr > ')'
so it gets locked into needing the argument to be successfully parsed as an int expr, and if that fails it doesn't simply try backtracking to an alternative as it would if a ">>" sequence failed. Instead, it throws an exception like we've been seeing "Parse error. Expected integer function expression here:" So, I think we should change most, if not all, of those expectation parsers into plain sequence parsers so they can backtrack upon failure.

Now, I've made a pass at implementing the above, but didn't get it fully functional. At first I tried putting the castable_as_int_value_ref parser into IntValueRefParser.cpp, but then on starting FO I got initialization deadlock like we've run into before. So then I instead moved it into DoubleValueRefParser.cpp, where it can go after there are already other references to the int parser. Then I could get through startup, but when trying to actually stat a game it would crash. So, no cigar yet.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13603
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: parsing experiment notes

#2 Post by Geoff the Medio »

Dilvish wrote:So, I think we should change most, if not all, of those expectation parsers into plain sequence parsers so they can backtrack upon failure.
That's fine, as long as you keep in mind that some parsers have side effects, particularly if they allocate memory using new_ in semantic actions. I'm not sure how the parser handles backtracking in situations such as this, but I would guess it's not going to automatically delete everything that gets allocated with new when a parse failure / backtrack situation occurs...

I don't know if that's why > vs >> have been used... there may not have been much thought put into it, or if there was, it wasn't explained anywhere and the initial intention was ignored by subsequent edits.

Similarly, I'm not clear on the uses of %= vs =, though I think I've seen a few mentions of similar issues with semantic actions.

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: parsing experiment notes

#3 Post by Dilvish »

A bit of good news, it appears I have the Turn condition parser working fine with castable_as_int_value_ref. It's possible that it turns out I had overlooked an ampersand, trying

Code: Select all

            const parse::value_ref_parser_rule< int >::type castable_as_int_value_ref = 
                parse::value_ref_parser_castable_as_int();
whereas what I needed was

Code: Select all

            const parse::value_ref_parser_rule< int >::type& castable_as_int_value_ref = 
                parse::value_ref_parser_castable_as_int();
Geoff the Medio wrote:
Dilvish wrote:So, I think we should change most, if not all, of those expectation parsers into plain sequence parsers so they can backtrack upon failure.
That's fine, as long as you keep in mind that some parsers have side effects, particularly if they allocate memory using new_ in semantic actions. I'm not sure how the parser handles backtracking in situations such as this, but I would guess it's not going to automatically delete everything that gets allocated with new when a parse failure / backtrack situation occurs... I don't know if that's why > vs >> have been used... there may not have been much thought put into it, or if there was, it wasn't explained anywhere and the initial intention was ignored by subsequent edits.
That probably is one of the reasons they were used-- it seems like efficiency and the possibility of memory leaks or other side effects from unexpected/uncontrolled backtracking are the major (probably even the only) reasons I've seen posted in favor of using expectation parsing. It does look like there should be a way to deal with cleaning up memory leaks, per the Spirit FAQ. I also saw a number of places recommending using the 'not' lookahead predicate to help minimize backtracking. I've shifted most of the expectation parsers over to plain sequence parsers and haven't noticed any issue with it, though I haven't played any long games with it like this yet. But we don't do much (if any?) repeat parsing, so it doesn't seem that should be an issue.
Similarly, I'm not clear on the uses of %= vs =, though I think I've seen a few mentions of similar issues with semantic actions.
I did some reading an experiments and figured it out a bit. One issue regarding '=' is the Alias issue mentioned in the FAQ linked above, but that's fairly distinct from the decision between "=" or "%=". The key distinction has to do with attribute propagation. "%=" requires that all the RHS alternatives have attributes that meet certain compatibility rules with the LHS attribute., and is discouraged when there are semantic actions. The plain "=" seems to override the LHS attribute type. There is at least one place (I forget where at the moment) where we use a set of alternate subrules on the RHS where the last one doesn't match the LHS attribute well enough, and so changing the "=" to a "%=" triggers a compile error about that. Besides the alias issue if there is just one rule on the RHS, I never did see any reference to negative complications from using '='.

A patch with my current status on these changes is attached.
Attachments

[The extension patch has been deactivated and can no longer be displayed.]

If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13603
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: parsing experiment notes

#4 Post by Geoff the Medio »

Dilvish wrote:(int_value_ref [ _a = _1 ] | castable_as_int_value_ref [ _a = _1 ]))
I suggest making a new parser called something like flexible_int that combines these two parsers, so you can just get and use that, rather than having the verbosely list both options every time.

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: parsing experiment notes

#5 Post by Dilvish »

Geoff the Medio wrote:
Dilvish wrote:(int_value_ref [ _a = _1 ] | castable_as_int_value_ref [ _a = _1 ]))
I suggest making a new parser called something like flexible_int that combines these two parsers, so you can just get and use that, rather than having the verbosely list both options every time.
Sounds good, I'll rework it and attach a new patch for review.

While testing out parsing changes, I noticed that parsing error messages I was getting wouldn't line up with the actual lines in the file, it would report a line number much greater than the actual line number. An obvious culprit was the include files at the start of the file, so I moved them to the end of the file (this was with buildings.txt, so I moved the includes for shared_macros.txt and col_buildings.txt. Then, the reported line number would actually be *less* than the actual line number. At first I thought perhaps it wasn't counting comment lines, but that didn't seem to be it. Any ideas?
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13603
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: parsing experiment notes

#6 Post by Geoff the Medio »

Dilvish wrote:Then, the reported line number would actually be *less* than the actual line number. At first I thought perhaps it wasn't counting comment lines, but that didn't seem to be it. Any ideas?
Dunno why it would be less, but note that any macros insertions that include a linebreak will also throw off the subsequent lines' numbering.

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: parsing experiment notes

#7 Post by Dilvish »

Geoff the Medio wrote:Dunno why it would be less, but note that any macros insertions that include a linebreak will also throw off the subsequent lines' numbering.
Ah, ok, well then perhaps it was a combination of not counting comment lines, but also getting some extra lines via macro substitution.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: parsing experiment notes

#8 Post by Dilvish »

Geoff the Medio wrote:I suggest making a new parser called something like flexible_int that combines these two parsers, so you can just get and use that, rather than having the verbosely list both options every time.
Ok, here is the patch. It replaces the normal int_ref parser with the flexible int parser in the various other parsers where I felt it made sense, which is many/most of times, perhaps just about everywhere except in places where an ID of some kind was needed.

This also includes a slight bit of expansion/correction on ValueRef Descriptions (driven by me wanting to be able to get a full description for the Experimentor Outpost Effects when testing some aspects of the parser).

Additionally there are two interim Empire Statistics that are just for highlighting/testing some aspects of which statistics are working and which aren't. I could take those out before committing if you prefer, but since Vezzra and maybe others are also testing out parser issues right now I'm inclined to leave them for a while.
Attachments

[The extension patch has been deactivated and can no longer be displayed.]

If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13603
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: parsing experiment notes

#9 Post by Geoff the Medio »

You could clean up the code a bit... in particular, you don't use most of these:

Code: Select all

+            qi::_1_type _1;
+            qi::_a_type _a;
+            qi::_val_type _val;
+            using phoenix::construct;
+            using phoenix::new_;
+            using phoenix::push_back;
+            using phoenix::static_cast_;
This seems a bit odd:

Code: Select all

-''' ( [[EXPERIMENTOR_SPAWN_CALC_A]] * ((Max(1, 3 - GalaxyPlanetDensity))^0.5) * ((Max(1, GalaxySize / 200))^0.4))'''
+''' ( [[EXPERIMENTOR_SPAWN_CALC_A]] * ((Max(1, 3 - GalaxyPlanetDensity))^0.5) * ((Max(1, GalaxySize / 200.0))^0.4))'''
... Is there a reason to switch 200 to 200.0 if all double valueref expressions include casting from ints anyway, and if so, why not also change 1 and 3 to 1.0 and 3.0?

Otherwise seems reasonable, but I am a bit vague on the purpose (will it actually fix a problem, or is this just to be more restrictive when it is not reasonable to expect any double expression to be useful as an int (eg. for ID numbers))?

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: parsing experiment notes

#10 Post by Dilvish »

Geoff the Medio wrote:You could clean up the code a bit... in particular, you don't use most of these:

Code: Select all

+            qi::_1_type _1;
+            qi::_a_type _a;
+            qi::_val_type _val;
+            using phoenix::construct;
+            using phoenix::new_;
+            using phoenix::push_back;
+            using phoenix::static_cast_;
Ah yes, sorry. I had coped a more involved struct and didn't get that fully cleaned up, I'll do so now.

This seems a bit odd:

Code: Select all

-''' ( [[EXPERIMENTOR_SPAWN_CALC_A]] * ((Max(1, 3 - GalaxyPlanetDensity))^0.5) * ((Max(1, GalaxySize / 200))^0.4))'''
+''' ( [[EXPERIMENTOR_SPAWN_CALC_A]] * ((Max(1, 3 - GalaxyPlanetDensity))^0.5) * ((Max(1, GalaxySize / 200.0))^0.4))'''
... Is there a reason to switch 200 to 200.0 if all double valueref expressions include casting from ints anyway, and if so, why not also change 1 and 3 to 1.0 and 3.0?
Well, that all depends on when and where it decides it needs to cast from int to double, but on double checking I can put it back. I had noticed that the 200 (if not given as 200.0) is put into the Description as an int, and wanted to be careful that "GalaxySize / 200" wouldn't itself get parsed as an intref and have its value get truncated. But I double checked, and it's only the constant 200 itself that will get stored as an intref, the expression GalaxySize / 200 is a double_ref and works out fine. At least the way things are currently set up, if any expression has a double in it then all of the subexpressions will be parsed as doubles if possible (otherwise parsed as int_refs and then cast to double_refs, like with the constants and things like GalaxySize). So if you'd rather all the constants in that calc be shown as doubles 1.0, 3.0 etc., that's fine, I just left them as ints because it is more concise, but showing them as doubles would be perhaps more clear that the whol calc is done as doubles.
Otherwise seems reasonable, but I am a bit vague on the purpose (will it actually fix a problem, or is this just to be more restrictive when it is not reasonable to expect any double expression to be useful as an int (eg. for ID numbers))?
The one way that this is more restrictive than current is that it won't allow a double constant like 0.5 to be automatically cast to an int constant 0 by the int_ref parser. Since the Turn condition needs an int_ref, if an int_ref has an option to automatically cast a double constant to an int constant then we can wind up with messed up calculations. That's what you had pointed out with the experimentor outpost parsing. If we disable the int_ref parser from casting double constants to int constants but don't add any other flexibility, then the Turn condition in the experimentor outpost activation clause will fail to parse, in turn making the activation clause fail to parse, which gives us the error

Code: Select all

default/buildings.txt:1540:12: Parse error.  Expected Effects = here:
                        Destroy
                    ]

        EffectsGroup
            scope = Source
            activation = Or [
            ^
                Not ContainedBy Contains And [ Monster Unowned ]
                Turn high =  ( ( 200 + (50 * (5 - GalaxyMaxAIAggression)) + (1000 * (If condition = ValueTest high = 2 testvalue = GalaxyMaxAIAggression))) * ((Max(1, 3 - GalaxyPlanetDensity))^0.5) * ((Max(1, GalaxySize / 200))^0.4))      // always remove lanes before spawn start turn
            ]
            effects = RemoveStarlanes endpoint = WithinStarlaneJumps jumps = 1 condition = Source
So altogether I'd say this change to the int-ref parser is really a bug fix, and that since the int-ref parser (rightly) doesn't have any other way to make doubles work as ints, adding the flexible_int parser in reasonable locations such as with the Turn condition is a change to less restriction, not more, and does solve a problem.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13603
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: parsing experiment notes

#11 Post by Geoff the Medio »

Does this eliminate the need for explicit cast functions then? Anything that expects a (flexible) int, but uses a double in the middle, will get parsed as a double and then flex-casted back to int at the end...
Dilvish wrote:...clause will fail to parse, in turn making the activation clause fail to parse, which gives us the error
The non-locality of this error message is perhaps due to using >> instead of > ? If so, it might be best to use > for connecting the bits of a non-expression parser...

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: parsing experiment notes

#12 Post by Dilvish »

Geoff the Medio wrote:Does this eliminate the need for explicit cast functions then? Anything that expects a (flexible) int, but uses a double in the middle, will get parsed as a double and then flex-casted back to int at the end...
Right, we wouldn't need the RoundUp, RoundDown and Truncate script functions, since this would essentially automatically add a last minute truncate where necessary. The RoundUp and RoundDown might add a nice bit of extra control for a scripter, but we can easily continue without them.
Dilvish wrote:...clause will fail to parse, in turn making the activation clause fail to parse, which gives us the error
The non-locality of this error message is perhaps due to using >> instead of > ? If so, it might be best to use > for connecting the bits of a non-expression parser...
I've been looking at that issue, and I'd agree that after explicit unique tokens, like after the Turn token, we should use '>' and not '>>' because we really know that there isn't a different parser for that Turn token and so if it fails it should be an error. Things like the Min and Max tokens, on the other hand, can get parsed in two different ways and so should still be followed by '>>'. Looking at ConditionParser3.cpp, for example, I see that some of the conditions use '>' after their token, and others use '>>', probably just because we we're so aware of the significance of the distinction. I'll do a pass on conforming those along these lines and repost the patch.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13603
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: parsing experiment notes

#13 Post by Geoff the Medio »

Dilvish wrote:The RoundUp and RoundDown might add a nice bit of extra control for a scripter, but we can easily continue without them.
These may be achieved by adding or subtracting 0.5 from the result before truncation.
Things like the Min and Max tokens, on the other hand, can get parsed in two different ways and so should still be followed by '>>'.
In the statistic parser, for example, it can probably be rewritten as:

Code: Select all

    statistic_3
        =     ( parse::enum_parser<ValueRef::StatisticType>() >> parse::label(Value_token) ) [ _b = _1 ]
            >   value_ref [_c = _1 ]
            >   parse::label(Condition_token) >    parse::detail::condition_parser
                [ _val = new_<ValueRef::Statistic<T> >(_c, _b, _1) ]
        ;
As to be a statistic, it has to be a statistic type, which might be "max" followed by "value =", and not something of the form max(...,...). Probably most cases in strictly-structured parsers such as this should be > by default and >> only when there can be ambiguity or assuming from a partial match could lead to mis-parses. In this case I've also attempted to structure it with () and [] placement to avoid allocating before confirmation, but it's probably not necessary in this case as it's just a constant enum value being parsed and assigned to the attribute for the StatisticType (and not a memory allocation with new_).

Another case where I had issues with > was in the bound variable parser. I switch to using the called form for both int and double parsing, and this was causing problems because it would see "Source." and immediately go into an expectation of seeing a bound int or bound double postfix, but would then fail parsing in cases where the other was actually present, even though the other type of bound variable parser was later in the higher-level parser structure. Switching to >> fixed this (after several days of frustration).

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: parsing experiment notes

#14 Post by Dilvish »

Oof! This latest round wound up being a lot more work than I expected. Attached is a version of the flexible_int patch updated as discussed above.

A few notes:
  • Here's a biggie: it seems that the '>' operator (rather surprisingly to me) has implications beyond the thing immediately following it, in what I'll call the local rule grouping (parenthesis can make a different rule grouping). To clarify with the example of how I discovered this, the CreateShip rules

    Code: Select all

                create_ship_1
                    =    tok.CreateShip_
                    >>   parse::label(DesignName_token) >> int_value_ref [ _b = _1 ] // TODO: DesignName -> DesignID.
                    >>   parse::label(Empire_token)     >> int_value_ref [ _c = _1 ]
                    >>   parse::label(Species_token)    >> string_value_ref [ _val = new_<Effect::CreateShip>(_b, _c, _1) ]
                    ;
    
                create_ship_2
                    =    tok.CreateShip_
                    >>   parse::label(DesignName_token) >> tok.string [ _a = _1 ]
                    >>   parse::label(Empire_token)     >  int_value_ref [ _b = _1 ]
                    >>   parse::label(Species_token)    >> string_value_ref [ _val = new_<Effect::CreateShip>(_a, _b, _1) ]
                    ;
    
                create_ship_3
                    =    tok.CreateShip_
                    >>   parse::label(DesignName_token) >> tok.string [ _a = _1 ]
                    >>   parse::label(Empire_token)     > int_value_ref [ _val = new_<Effect::CreateShip>(_a, _1) ]
                    ;
    
                create_ship_4
                    =    tok.CreateShip_
                    >>   parse::label(DesignName_token) >> tok.string [ _val = new_<Effect::CreateShip>(_1) ]
                    ;
    would lead to surprising errors like

    Code: Select all

    default/specials.txt:674:63: Parse error.  Expected integer expression here:
                    OwnerHasTech name = "SHP_DOMESTIC_MONSTER"
                    Random probability = 0.05
                ]
                stackinggroup = "KRAKEN_NEST_STACK"
                effects = [
                    CreateShip designname = "SM_KRAKEN_1" empire = Source.Owner
                                                                   ^
    whereas things would work just fine if I changed create_ship_2 so that the empire line either used '>>' or was in parenthesis. To be explicit, both of the following would work:

    Code: Select all

                create_ship_2
                    =    tok.CreateShip_
                    >>   parse::label(DesignName_token) >> tok.string [ _a = _1 ]
                    >>   parse::label(Empire_token)     >>  int_value_ref [ _b = _1 ]
                    >>   parse::label(Species_token)    >> string_value_ref [ _val = new_<Effect::CreateShip>(_a, _b, _1) ]
                    ;
    
    and

    Code: Select all

                create_ship_2
                    =    tok.CreateShip_
                    >>   parse::label(DesignName_token) >> tok.string [ _a = _1 ]
                    >>   (parse::label(Empire_token)     >  int_value_ref [ _b = _1 ])
                    >>   parse::label(Species_token)    >> string_value_ref [ _val = new_<Effect::CreateShip>(_a, _b, _1) ]
                    ;
    So I guess the lesson is that if you need to have a ">>" someplace past a '>' within the same rule, and you want to allow backtracking within that rule to before the '>', the '>' has to be contained within a parenthesized subrule separated from the later potential failure point. I guess that makes it so the backtracking goes past the whole parenthesized subrule rather than being blocked by the '>'. The last form of create_ship_2 above is the one I went with since I think it should be more efficient. It would probably be most efficient to combine some of these createship rules, but I'll leave that for later.
  • I used '>' rather than '>>' in front of everything that was optional, i.e., preceded by '-' or '*', or any subrules that had eps as an alternative, since those are all guaranteed to pass. After discovering the above quirk I went back to double check these to see if they needed extra parenthesis or something, but they were ok.
  • There were a few places where a '>' could currently work, like with owned_by_4, where because it is tried after owned_by_2 could probably be fine using '>' after parse::label(Affiliation_token), but then if it ever got shifted to before owned_by_2 it would be broken, so I left the '>>'
  • I made And[] and Or[] so that they could accept simply one or more conditions between the square brackets rather than requiring two or more. When editing things and experimenting it is much more convenient to not have to remove the brackets if one has (perhaps temporarily) gone down to a single condition.
Attachments

[The extension patch has been deactivated and can no longer be displayed.]

If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13603
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: parsing experiment notes

#15 Post by Geoff the Medio »

Dilvish wrote:

Code: Select all

            create_ship_2
                =    tok.CreateShip_
                >>   parse::label(DesignName_token) >> tok.string [ _a = _1 ]
                >>   parse::label(Empire_token)     >  int_value_ref [ _b = _1 ]
                >>   parse::label(Species_token)    >> string_value_ref [ _val = new_<Effect::CreateShip>(_a, _b, _1) ]
                ;

            create_ship_3
                =    tok.CreateShip_
                >>   parse::label(DesignName_token) >> tok.string [ _a = _1 ]
                >>   parse::label(Empire_token)     > int_value_ref [ _val = new_<Effect::CreateShip>(_a, _1) ]
                ;
would lead to surprising errors like

Code: Select all

default/specials.txt:674:63: Parse error.  Expected integer expression here:
                OwnerHasTech name = "SHP_DOMESTIC_MONSTER"
                Random probability = 0.05
            ]
            stackinggroup = "KRAKEN_NEST_STACK"
            effects = [
                CreateShip designname = "SM_KRAKEN_1" empire = Source.Owner
                                                               ^
This looks like another example of "ambiguity or assuming from a partial match could lead to mis-parses" that I mentioned... It sees input matching

Code: Select all

(parse::label(DesignName_token) >> tok.string >> parse::label(Empire_token))
which appears in the rule create_ship_2, and then there is an expectation for the following input to match

Code: Select all

(int_value_ref >> parse::label(Species_token) >> string_value_ref)
but the actual input lacks the "species =", so it reports a failed expectation, and never backtracks to try the create_ship_3 rule. I'm a bit surprised that the error message refers only to the int_value_ref part, though, rather than something about an unnamed-rule composite, but I suppose it makes a bit of sense.

In such cases of ambiguity, you'd have to use all >> between the effectively-optional sub-parsers until past any ambiguous parts to avoid getting irreversibly committed to the wrong sub-parser.

But, given what you reported works with bracketing, I think you should be able to do

Code: Select all

            create_ship_2
                =    tok.CreateShip_
                >>  (parse::label(DesignName_token) >> tok.string [ _a = _1 ])
                >>  (parse::label(Empire_token)     > int_value_ref [ _b = _1 ])
                >>  (parse::label(Species_token)    > string_value_ref [ _val = new_<Effect::CreateShip>(_a, _b, _1) ] )
                ;

            create_ship_3
                =    tok.CreateShip_
                >>  (parse::label(DesignName_token) >> tok.string [ _a = _1 ])
                >>  (parse::label(Empire_token)     > int_value_ref [ _val = new_<Effect::CreateShip>(_a, _1) ] )
                ;
which should avoid ambiguity, but also give better locality of error messages when parsers don't match things after labels that (locally) are unambiguous. Possibly also the final [] could be moved outside the last () and put on the next line for better readability. If the comment-suggested change of create_ship_1 to have a label containing "id" instead of "DesignName" is made, then the >> after both instances of parse::label(DesignName_token) could be changed to > without worrying about reordering of these rules.
It would probably be most efficient to combine some of these createship rules, but I'll leave that for later.
That might sound tempting, but I think it's actually much worse to combine multiple rules and have OR blocks ( | operator) between the various options in a single big complicated rule. This ends up requiring a lot more RAM to compile, which is why I've split up many of the parsers with variant forms such as create_ship. And particularly in this case, it's a bit awkward to structure things simply, as you'd want to stop parsing and run the final semantic action if, after matching (parse::label(DesignName_token) >> tok.string), additional text doesn't appear, or then again stop after matching (parse::label(Empire_token) > int_value_ref), meaning you'd end up with several nested else cases. And, the possibility at the start to match an int ref instead of a string after matching parse::label(DesignName_token) makes it more complicated as well.
[*]I used '>' rather than '>>' in front of everything that was optional, i.e., preceded by '-' or '*', or any subrules that had eps as an alternative, since those are all guaranteed to pass. After discovering the above quirk I went back to double check these to see if they needed extra parenthesis or something, but they were ok.
I don't think it matters if you do ">> -(stuff)" or "> -(stuff)". Possibly if there is no subsequent non-optional parser, it might help locality of errors to use > but I'm not sure...
[*]I made And[] and Or[] so that they could accept simply one or more conditions between the square brackets rather than requiring two or more.
Sounds reasonable.

Otherwise... if it works, it's probably fine to commit.

A further few thoughts: I have worked a bit on more parser separation stuff, and one thing I tried was adding expression parsing to the simple int parser. This would mean you could do "Source.Age + 5" instead of just "Source.Age" when a simple int is expected. For the non-flexible int though, since it's mainly intended for things requiring an ID number, do you think there's any need for expression parsing? Presumably it would generally be just used to get ids which would have no need for mathematical manipulations. If not, things could be structured so the non-flexible int only accepts simple non-expression valuerefs... I'm not sure if this is a useful restriction to add, but if it is, it could probably be set up. Alternatively, an even more restricted set of int valuerefs could be set up to only have access to id-type int valuerefs... so no age or counts of things, but only stuff like Source.ID or LocalCandidate.ProducedByEmpireID or similar... The C++ code hasn't bothered to make id numbers a separate data type, but it perhaps might be useful to do so in the parser? (Though I doubt it's worth the effort...)

Post Reply