[PATCH] Add UserString to content scripts.

Message

adrian_broher · #1 Post by **adrian_broher** » Mon May 27, 2013 2:23 pm

Hello,

in the spirit of [Patch] Simplify UserString calls. and [PATCH] Added NOP UserString macro the patches provided in this thread will add the UserString syntax to the various content scripts to to mark translation keys as those.

For example take the Species statement:

Code: Select all

Species
    name = "SP_ABADDONI"
    description = "SP_ABADDONI_DESC"
    Native
    CanProduceShips
    CanColonize

    tags = "LITHIC"

name and description and every tag are translatable strings. With the patches added the syntax changes to

Code: Select all

Species
    name = UserString("SP_ABADDONI")
    description = UserString("SP_ABADDONI_DESC")
    Native
    CanProduceShips
    CanColonize

    tags = UserString("LITHIC")

This marks the strings as translatable. No actual substitution is performed, the UserString 'call' is a non operation, the string is passed through. The syntax is not optional, for the translatable entries you are required to wrap them with UserString(). The syntax only applies to entries, that are translatable.

Please commit the patches separately.

#2 Post by **Dilvish** » Mon May 27, 2013 4:53 pm

I have mixed feelings about this proposal. The call will be up to Geoff and eleazar, but I can't refrain from commenting -- I can see that in the c++ code, a structural flag such as the NOP UserString was necessary to enable automatic identification of translation tags to be comprehensive, with no ready alternative, and it seems reasonable to consider the additional syntax burden on the programmer to be fairly trivial. That's a strong benefit with a low cost. On the content scripting side, I'm concerned that the cost benefit balance tips the other way.

The actual benefit is much lower, because there are already structural flags/tokens in place that mark every translatable entry; those structural flags are simply more varied that "NOPUserString". You must have already gone through and identified all those flags in order for the patch to make the parser require the new syntax (or even be compatible with it). So in this case, there is a ready alternative -- rather then putting an additional syntax burden on every piece of scripted content, the translatable-string-identifier-script could be made more complex to check for these additional structures within the content directory. That alternative seems to substantially reduce the benefit of requiring this additional syntax.

On the cost side, it seems to me that there is a greater cost as well. I expect Geoff and others have gone through a lot of effort to make the scripting syntax as streamlined as possible, to facilitate content creation by nonprogrammers. I suppose many of them will create via copy-and-modify, but still, this seems like a potentially greater burden on them than it was for c++ programmers. Additionally, it seems to me that the additional syntax clutters/degrades readability of the content scripts much more than it did the c++ code.

At least a portion of the work you did to make this patch would still be applicable to augmenting the translatable-string-identifier-script, so that work wouldn't be entirely lost if you did take the alternate approach.

p.s.
Geoff and I had previously discusses exposing the "UserString" call to the python AI, and having it use that for all strings visible to the user. I'll bump that up on my priority list, and add NOPUserString as well.

Bigjoe5 · #3 Post by **Bigjoe5** » Mon May 27, 2013 5:24 pm

I have to agree with Dilvish. Our scripters are not just "us". The intention is that users can modify FO freely and easily. Anything that complicates the scripting language, with no benefit to the language itself is not beneficial, IMO, even if it makes things a bit better on the code side. My principle is that in general, messiness should be pushed downward, not upward.

adrian_broher · #4 Post by **adrian_broher** » Mon May 27, 2013 5:42 pm

Dilvish wrote:I have mixed feelings about this proposal.

That is understandable, but I will try to answer to give you a better insight in the idea behind it.

Dilvish wrote:The actual benefit is much lower, because there are already structural flags/tokens in place that mark every translatable entry; those structural flags are simply more varied that "NOPUserString". You must have already gone through and identified all those flags in order for the patch to make the parser require the new syntax (or even be compatible with it). So in this case, there is a ready alternative -- rather then putting an additional syntax burden on every piece of scripted content, the translatable-string-identifier-script could be made more complex to check for these additional structures within the content directory. That alternative seems to substantially reduce the benefit of requiring this additional syntax.

I would agree if the identification of translatable items would be trivial. But the problem actually is that this isn't the case. Given the fact that the parser is token based the source semantics would allow to span simple assignments over several lines, so this case would need be handled. Also there are not so simple assignments of lists (tags for example) that would require a the translatable-string-identifier-script to understand the semantic of those lists. List assignments are optional for tags, so you need to handle this case to. As you see the translatable-string-identifier-script gets more and more complex, it needs to understand the scripts semantic for a simple case, that could be handled by some simple regular expression. And finally as you said there is the need to correctly identify the parameters that need translation. Even that isn't possible. For example: The 'category' parameter for encyclopedia articles are translatable strings, whereas the 'category' parameter of Techs is in fact a reference to (Tech)Category objects. Now we need to evaluate the context where the parameter exists. And with that we are at a point where need a full blown fo-script parser.

Dilvish wrote:On the cost side, it seems to me that there is a greater cost as well. I expect Geoff and others have gone through a lot of effort to make the scripting syntax as streamlined as possible, to facilitate content creation by nonprogrammers.

Bigjoe5 wrote:IThe intention is that users can modify FO freely and easily. Anything that complicates the scripting language, with no benefit to the language itself is not beneficial, IMO, even if it makes things a bit better on the code side.

Geoff had said on serveral occations that this is the case so I think he and the other devs want to keep it that way, but I think the 'make it easy for non-programmers' is pretty moot. Please keep the following in mind:

Contributors, regardless of programming knowledge, need to learn the syntax to realize their idea
When stepping back and looking at the fo-script documentation I realize that the fo-script language is already complex and non intuitive. You need to have or to gain programming knowledge to implement something in fo-script. Adding additional syntax wouldn't make a difference as long as it is documented, behaves deterministic and is easily debuggable.

Dilvish wrote:Additionally, it seems to me that the additional syntax clutters/degrades readability of the content scripts much more than it did the c++ code.

That's maybe a point I could agree to somewhat. But choosing another syntax isn't that easy. [ ] os already used for grouping/lists further overloading it just makes the language harder to understand, using something like _"IDENTIFIER" would reduce cluttering but isn't as bold as UserString("IDENTIFER"). But I'm open for ideas on that. Point is that to prevent implementing a full fledged parser an exclusive for translations token identifier would be needed. I did choose UserString("") because it is already in place anywhere else.

Dilvish wrote:At least a portion of the work you did to make this patch would still be applicable to augmenting the translatable-string-identifier-script, so that work wouldn't be entirely lost if you did take the alternate approach.

It's would not be a problem to abandon the patch, but I can hardly think of a way to implement it in a better way. But of couse I'm open for suggestions.

Dilvish wrote:Geoff and I had previously discusses exposing the "UserString" call to the python AI, and having it use that for all strings visible to the user. I'll bump that up on my priority list, and add NOPUserString as well.

That should be UserStringNop in that case, to keep the variants minimal.

But thanks for your input on that matter.

#5 Post by **Dilvish** » Mon May 27, 2013 6:16 pm

If you like my last idea, then these first couple responses are moot.

adrian_broher wrote:I would agree if the identification of translatable items would be trivial. But the problem actually is that this isn't the case. Given the fact that the parser is token based the source semantics would allow to span simple assignments over several lines, so this case would need be handled. Also there are not so simple assignments of lists (tags for example) that would require a the translatable-string-identifier-script to understand the semantic of those lists. List assignments are optional for tags, so you need to handle this case to. As you see the translatable-string-identifier-script gets more and more complex, it needs to understand the scripts semantic for a simple case, that could be handled by some simple regular expression.

it seems to me that everything you've mentioned above could be fairly readily handled via regular expressions (and because of spanning multiple lines, using sed or pcregrep instead of grep).

And finally as you said there is the need to correctly identify the parameters that need translation. Even that isn't possible. For example: The 'category' parameter for encyclopedia articles are translatable strings, whereas the 'category' parameter of Techs is in fact a reference to (Tech)Category objects. Now we need to evaluate the context where the parameter exists. And with that we are at a point where need a full blown fo-script parser.

I think it likely that everyone would be open to changing one of these 'category' parameters to avoid this translatability conflict.

... but I think the 'make it easy for non-programmers' is pretty moot. Please keep the following in mind:
Contributors, regardless of programming knowledge, need to learn the syntax to realize their idea

When stepping back and looking at the fo-script documentation I realize that the fo-script language is already complex and non intuitive. You need to have or to gain programming knowledge to implement something in fo-script. Adding additional syntax wouldn't make a difference as long as it is documented, behaves deterministic and is easily debuggable.

Just because there is existing burden, I don't think that automatically makes additional burden not make a difference. The question is far from moot.

It's would not be a problem to abandon the patch, but I can hardly think of a way to implement it in a better way. But of couse I'm open for suggestions.

Here's another alternate approach that just came to mind -- instead of augmenting the parser so that it requires this UserString syntax, augment it so that it can optionally generate translation helper files that contain all the UserString lines you would have wanted in the scripts. Adjust the build files so that the parser is also made as a standalone executable that can be run at the end of the build process or by a translator when they need to refresh these helper files.

adrian_broher · #6 Post by **adrian_broher** » Mon May 27, 2013 6:40 pm

Dilvish wrote:it seems to me that everything you've mentioned above could be fairly readily handled via regular expressions

That's a point were I can't agree with. Regular expressions are good a finding (multiline) patterns, but not at interpreting token streams.

Edit: After rethinking this you probably have a point there. I need to test something.

Dilvish wrote:Here's another alternate approach that just came to mind -- instead of augmenting the parser so that it requires this UserString syntax, augment it so that it can optionally generate translation helper files that contain all the UserString lines you would have wanted in the scripts. Adjust the build files so that the parser is also made as a standalone executable that can be run at the end of the build process or by a translator when they need to refresh these helper files.

The is approach is actually a pretty good. I will investigate that.

#7 Post by **eleazar** » Tue May 28, 2013 1:51 am

Not being a coder, i'm only seeing the con side of this -- slightly klunkier scripting. I can't speak to weather the pro side justifies the con.

FO script isn't simple, but it does have a certain degree of consistency. This bit seems out of place.

Also there has to be a less opaque way to indicate a translatable string than the phrase "UserString"

wheals · #8 Post by **wheals** » Tue May 28, 2013 3:13 am

Maybe I'm missing something (I know around nothing about parsing in general), but if the point is just to be able to find the user-translatable strings in a content script file (which seems to be the use of the related patches), couldn't a coder/scripter (or the computer, in a slightly different manner) just search for quotation marks? ALL strings in content files are translatable, unlike code which may have unrelated, strings that wouldn't possibly need to be translated.

#9 Post by **Geoff the Medio** » Tue May 28, 2013 3:34 am

wheals wrote:...couldn't a coder/scripter (or the computer, in a slightly different manner) just search for quotation marks?

Consider:

Code: Select all

graphic = "icons/species/hhhoh.png"

FreeOrion

[PATCH] Add UserString to content scripts.

[PATCH] Add UserString to content scripts.

Re: [PATCH] Add UserString to content scripts.

Re: [PATCH] Add UserString to content scripts.

Re: [PATCH] Add UserString to content scripts.

Re: [PATCH] Add UserString to content scripts.

Re: [PATCH] Add UserString to content scripts.

Re: [PATCH] Add UserString to content scripts.

Re: [PATCH] Add UserString to content scripts.

Re: [PATCH] Add UserString to content scripts.