Stringtable / Translation Issues

For what's not in 'Top Priority Game Design'. Post your ideas, visions, suggestions for the game, rules, modifications, etc.

Moderators: Oberlus, Oberlus

Message
Author
User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 12456
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#31 Post by Geoff the Medio » Fri Feb 18, 2005 10:30 am

Ablaze: Ah, so you're worrying about games between two different human players that don't speak the same language?

That's a whole other issue I wasn't considering... but I think it could be covered by one of the systems that's been proposed above... at least internally. The UI issues for picking diplo messages I hadn't considered at all... but is a bit out of scope of what I was considering... though probably shouldn't be.
Then, what about formal vs nonformal and other stuff that isn’t in the English language?
I think you misunderstand what I've been proposing. A major goal of the system is to avoid any built-in preference to a particular language. That means that the game code itself has no idea what language it's working in, and all the grammar rules for each language are encoded in the stringtable entries using the markup itself.

To accomplish this, I've suggested using "flags" on strings, which the translator can add or remove and add or remove checks for, and make different forms of strings in response to the presence of different flags. Since the translator can set up as many or as few flags and checks as they want / need, we should presumably have full language independence.

I think your XML suggestion is just a differently formatted version of the same thing really...
As far as I can see, a string table based translator would have to have a fragment for "I like" in the native language, one for "is nice", one for "he" and "she" and perhaps one for "name."
No, it wouldn't. Avoiding language synthesis is a major goal of the proposed systems, since it's practically impossible to do it in english, let alone in a language independent way, let alone without a slew of programmer who speak each language...

From the suggestions above, there would just be stringtable entrires for:

a) complete phrases, like: "I like $NAME. He/She is nice." as a single sentence.

b) single nouns, like "Bill"

The noun entires come in two flavours:

b1) The noun has known grammatical properties, such as gender, plurality, (formality?) etc, which are embedded into the noun stringtable entry using some sort of flag or marker, and which alter the sentence into which they are inserted

b2) Generic nouns that can change their form depending on the sentence into which they are inserted, which have different versions of the string for the noun in the stringtable entry for all the possible combinations of gender, plurality, formality, etc, which are properties of the sentence

It's possible we might need to extend the noun-type entries to include other things like adjectives, but these are probably best left in the complete sentence entries, I think. It's really the nouns that are most in need of being treated like parameters, to allow some re-use of sentence stringtable entires...

As for canned diplomatic responses, that's probably a good idea to allow play with players of different languages to play together. I suspect the flags / parameters system will work for this as well... and can probably be encoded in XML for transmission.

However I'm more concerned about the format the translators will use to indicate the flags / checks they want to use while editing the stringtable to do their translation. These flags probably wouldn't appear in the string sent to players in other languages unless we establislhed a set of flags to be used across languages to mean the same thing (gender, plurality, formality) which could be ignored by languages that don't have similar concepts... This standardization wouldu probably be done outside of the game code though, and it would be the translator's responsibility to conform to these standards when doing their translation of the stringtable.

User avatar
Ablaze
Creative Contributor
Posts: 314
Joined: Tue Aug 26, 2003 6:10 pm
Location: Amidst the Inferno.

#32 Post by Ablaze » Fri Feb 18, 2005 7:47 pm

I think you should include the flags as separate variables, and not in the string itself.

That way if you find that you need to differentiate between the sex of one type of ship vs another. e.g. If a cruiser is male but a battlestar is female (I don't know why, ask the French.) then you can include that as an extra tag in the XML: <Message id=99 name="cruiser" sex=male\> Everyone who doesn’t care about whether their cruiser is male or female will not have a case for sex in the switch statement of their XML parser.

"My cruiser is bigger than your cruiser, and it can shoot its wad further too!!" is all good, but what if you define the language and then realize half way through that you want to use the same message for slave armies? You would need to add a variable for plural. If you have a string table, maybe the variable would look something like this? "$cruiser (male)". In order to expand the language to include plural you would have to put the plural tag at the end or else change everyone’s parser, even if they don’t have a distinction for plural in their language. In other words, changing the format to “$cruiser (plural) (male)” would invalidate all parsers that came across modifier #1 and read plural into sex.

The tags could also easily get unreadable. For instance if I had the variable “$cruiser (male) (single) (2) (near) (machine) (space_lice) (28)” you would need to consult the documentation to find out what all those variables mean. In XML you might still have to consult the documentation, but it would be a bit more obvious:
<Message id=99 name="cruiser" sex=male plural=false guns=2 near_planet=true ship_design=machine infested=space_lice infestation_percent=28 />

When you get down to it this breaks down to the same thing, but XML makes things so much easier and is so much more readable.
Time flies like the wind, fruit flies like bananas.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 12456
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#33 Post by Geoff the Medio » Fri Feb 18, 2005 9:35 pm

Ablaze wrote:I think you should include the flags as separate variables, and not in the string itself.
That would effectively require a complete separate string for all possible combinations of flags... I suppose we could do it that way, but it seems rather redundant...
That way if you find that you need to differentiate between the sex of one type of ship vs another. e.g. If a cruiser is male but a battlestar is female (I don't know why, ask the French.) then you can include that as an extra tag in the XML: <Message id=99 name="cruiser" sex=male\> Everyone who doesn’t care about whether their cruiser is male or female will not have a case for sex in the switch statement of their XML parser.
Ok.. but how do you know what sex flag to include in that message? By this, I mean that "sex=male" is a property of the noun cruiser, so needs to be associated with the string for that noun somehow (by the translator). You've just shown it as a separate parameter, but it's not clear what determines the value of this parameter, or whether to include it at all, or how it's associated with the other parameter if at all...
"My cruiser is bigger than your cruiser, and it can shoot its wad further too!!" is all good, but what if you define the language and then realize half way through that you want to use the same message for slave armies?
I'm not sure who "you" is, but if it's the translator, then you don't have the option to decide to use a message for something other than it was originally intended. All the possible messages are hard-coded into the game. The game looks up a particular stringtable entry based on its hard-coded name, and the translator provides the translated version of the corresponding string, which is displayed.
You would need to add a variable for plural. If you have a string table, maybe the variable would look something like this? "$cruiser (male)". In order to expand the language to include plural you would have to put the plural tag at the end or else change everyone’s parser, even if they don’t have a distinction for plural in their language. In other words, changing the format to “$cruiser (plural) (male)” would invalidate all parsers that came across modifier #1 and read plural into sex.
This makes no sense. The parser is in the code, not the stringtable entry. It would strip off all substrings (which are the flags attached to the string) inside brackets, and provide a list of these substrings/flags to the text renderer. The text render would parse another string, find a check for a particular flag, and if that flag was present in the list of flags it recieved, would do something as a result. If someone wants to add another flag, they are free to do so. The flags that are checked for are defined by the translator, and the flags that another translator uses in another language are irrelivant. And even if there were other flags from other languages being checked for or in the strings being checked, it wouldn't matter, as nothing would be done with these extra flags if they're not found or checked for.
The tags could also easily get unreadable. For instance if I had the variable “$cruiser (male) (single) (2) (near) (machine) (space_lice) (28)” you would need to consult the documentation to find out what all those variables mean. In XML you might still have to consult the documentation, but it would be a bit more obvious:
<Message id=99 name="cruiser" sex=male plural=false guns=2 near_planet=true ship_design=machine infested=space_lice infestation_percent=28 />
I can't very well respond the above because it's not what would be done in the system I or Blade Runner were suggesting. You don't seem to understand what was being suggested, or I don't understand what you're doing with what was being suggested. Perhaps you're trying to solve a different problem? Please outline specifically and clearly every part of the system you're propsing, and how the whole thing fits together, and how the game would take stringtable entries, or whatever your system would use, and how it would produce a sentence to show on the screen, and how the translator would alter this information, and what the game code would be doing with the information provided to it.

I can say that the number of parameters you're including in your example variable (though I'm not sure what you mean by "varaible") is far more than would be included in a stringable entry in the systems I've suggested. Your variables also don't seem to have any actual sentence text in them... so... I'm not sure if they're stringable entries, or what... but in the examples given above by myself / Blade Runner, the meaning of parameters and their values would be somewhat more evident from context. As well, the flags could be better named than "male" or "2" to avoid ambiguity. Also, I don't think we'd ever have a stringtable entry of the sort you suggest, in which every part of the sentence other than the basic idea of "ship(s) near monster(s)" is a parameter. Stringtable entries as I've proposed would be mostly complete sentences or larger bits of text, with a few parameters to add some reusability in ways that don't require significant alteration of the structure of the string.

User avatar
Ablaze
Creative Contributor
Posts: 314
Joined: Tue Aug 26, 2003 6:10 pm
Location: Amidst the Inferno.

#34 Post by Ablaze » Sat Feb 19, 2005 1:09 pm

I really don't know why you are focusing on the strings you are substituting rather then the data. It seems to me like you build the phrases from the data, and in the past I've usually found that if you get the data right everything else just falls into place so much more nicely.

If you like you can use meta tags, so it becomes:

Code: Select all

<Message id=96>
	<parameter number=1 name=cruiser sex=male plural=false \>
<\Message>
I'm not sure who "you" is
It’s the programmers. No one can be expected to come up with every conceivable message on the first try. There will always be some that have to be added later when it becomes apparent that they are needed.

So in your system something is either defined or not defined? So a variable’s sex could either be M, F or nil? What happens when it’s a ship and you want to define whether it’s biological or mechanical, and you want to use the M for mechanical?

I don’t really have time to defend my opinion in detail at the moment, sorry.
Time flies like the wind, fruit flies like bananas.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 12456
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#35 Post by Geoff the Medio » Sat Feb 19, 2005 1:57 pm

Ablaze wrote:I really don't know why you are focusing on the strings you are substituting rather then the data. It seems to me like you build the phrases from the data, and in the past I've usually found that if you get the data right everything else just falls into place so much more nicely.
Trying to "build the phrases" is exactly what I'm try to avoid! Writing a language synthesizer is practically impossible. It's only feasible to have essentially complete sentences that have a very limited amount of possible substitutions for specific nouns or numbers. It's not possible to have just an empty slate where every word or concept in a sentence or paragraph is determined by a parameter.

For example:

"We are in dire need of supplies! It is essential that you provide us with #AMOUNT of $OBJECT to support our ongoing crusade against the $ENEMY"

is ok.

"$RECIPIENT $STATUS $STATUSMODIFIER $STATUSOBJECT $REQUESTTYPE $ACTION $OBJECT #AMOUNT $REASONTYPE $REASONMODIFER $REASONOBJECT"

is not. You can format the latter as XML if you want, but it doesn't change the fact that it's just a bunch of variables that somehow have to be converted into a sentence from scratch.
So in your system something is either defined or not defined? So a variable’s sex could either be M, F or nil? What happens when it’s a ship and you want to define whether it’s biological or mechanical, and you want to use the M for mechanical?
Assuming "you" is again the programmers, then this question makes no sense or assumes things incorrectly. The programmer does not add flags. The set of flags is entirely determined by the translator, and the flags in one language have no impact on the flags in another, and there are no language independent flags. If some language has differeing grammatical constructions for mechanical nouns, then the translator for that language would add in whatever text for a flag they want. They can use "Mechanical" or "Mech" or "DX8245" or whatever makes sense to them in their language if they want to.

kess
Space Floater
Posts: 15
Joined: Thu Feb 10, 2005 1:53 pm
Location: The hemisphere

#36 Post by kess » Sat Feb 19, 2005 4:21 pm

Geoff the Medio wrote:"We are in dire need of supplies! It is essential that you provide us with #AMOUNT of $OBJECT to support our ongoing crusade against the $ENEMY"
Above sentence modified just to include a numerical expression and 'string flag':
"We are in dire need of supplies! It is essential that you provide us with #AMOUNT $OBJECT<if #AMOUNT is greater than 1>s</if> to support our ongoing crusade against <if $ENEMY is a>a</if><if $ENEMY is an>an</if> $ENEMY fleet."

(Not the best example perhaps, but I tried to build upon your previous examples of syntax-in-progress.)

Could you, please, add an example of how the string table could look for the above modified sentence, I believe it will be more or less crystal clear of how you are thinking then. :)

[edit]
It would be handy (if a similar syntax is going to be used) to have constructs like if-else or the like.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 12456
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#37 Post by Geoff the Medio » Sat Feb 19, 2005 5:01 pm

kess wrote:(Not the best example perhaps, but I tried to build upon your previous examples of syntax-in-progress.)
Uhm... I'm not sure what your point was there... That particular text was meant as an illustration of the sort of sentences that would be in a stringable entry, not an illustration of a possible syntax used to deal with variations of that text due to different parameters.
Could you, please, add an example of how the string table could look for the above modified sentence, I believe it will be more or less crystal clear of how you are thinking then. :)
I can't say how it would be done, as Blade Runner has not yet suggested a syntax for dealing with numbers, and I never worked one out myself. I can give a possible suggestion though:

This is different that the previous examples, in that we want to alter the form of the text of a parameter ($OBJECT) based on the string into which it is passed, rather than alter the string based on what is passed into it.

To do this, the stringtable entry passed for $OBJECT would probably need to check for flags in the string into which it is passed.

The main sentence might be something like:

STATEMENT
I want #AMOUNT $OBJECT[#AMOUNT].

Where [] indicates that a the flags of one parameter are being "passed" to another parameter. In this case, the flags of #AMOUNT are being passed to the string substituted for $OBJECT. However #AMOUNT is just a number, so probably the # would be a special character used for numbers that indicates that the value is to be treated as a flag if passed to another string.

The stringtable entry passed as $OBJECT would be something like:

BANANA_NOUN
banana[1]; bananas[else]

where [1] indicates to use the preceeding text "banana" if the flag "1" is present (ie. the value of #AMOUNT was 1) and [else] I just made up to indicate "if no check for flags found", which would cause the preceeding text "bananas" if the other check for flag ("1") is not found, which would be the case if #AMOUNT was not equal to 1. We'd probably also need some range checks (eg. greater than or less than conditions) or some other ways to check the values of number flags in a general way, perhaps with some wildcard characters in case some languages depend on what a particular digit's value is, rather than just whether the number as a whole is above or below a certain value.

Alternatively, we could have some way for the translator to define a special function that takes a numerical value and produces a set of flags for that number. So instead of

$OBJECT[#AMOUNT]

you'd do something like

$OBJECT[numflags(#AMOUNT)]

where numflags(#AMOUNT) returns a set of flags like (PLURAL) or (SINGULAR) or whatever's useful for a language, and then the stringtable entries passed to $OBJECT just check for those flags like they would normally. In some language, the cases for flags of numbers might be rather complicated, so it would be easier to just define the various checks on numbers and the flags produced once in a separate function, and then treat the number flags just like regular passed-in flags to parameter strings of other strings in order to determine what variant of a string to display.

In this case, the stringtable entry passed for $OBJECT would be something like:

BANANA_NOUN
banana[SINGULAR]; bananas[PLURAL]

and we eliminate the "else" thing, which is probably a bad idea anyway (see below).

As for the format of the number-to-flags function, I'm not sure how that would work... some complicated conditions are probably necessary, but at least they'd be restricted to this specific function, rather than being a general case that needs to work in all stringable entries... (assuming there's any advantage to this...)

Edit: Of course, a stringtable entry might both alter its form based on a flag, and pass back flags for the string into which it is substituted to use... so we might have

BANANA_NOUN
banana(FLAG1)[SINGULAR]; bananas(FLAG2)[PLURAL]

and then use that string for $OBJECT in

STATEMENT
I want #AMOUNT $OBJECT[numflags(#AMOUNT)]. {?$OBJECT[numflags(#AMOUNT)]:This text if FLAG1 returned[FLAG1];This text if FLAG2 returned[FLAG2]}.

The ? indicates to check the flags returned by doing $OBJECT[numflags(#AMOUNT)], and the [FLAG1] indicates to use the text preceeding it if FLAG1 is present... (and similarly for FLAG2 with the other option). If neither FLAG1 nor FLAG2 were retunred from $OBJECT[numflags(#AMOUNT)], then no text would be displayed from within the {} brackets.

So, if #AMOUNT was 1, then you'd get:

"I want banana. This text if FLAG1 returned."

and if #AMOUNT was some other number, you'd get:

"I want bananas. This text if FLAG2 returned."

and, to be clear, the numflags(#AMOUNT) would be something that says, effectively:

If #AMOUNT = 1, return (SINGULAR)
Else, return (PLURAL)

but the above is not meant to be the actual syntax that would be used...

and I realize that getting FLAG1 and FLAG2 from within the other string, rather than just checking for PLURAL or SINGULAR in the STATEMENT string is redundant, but I just wanted to illustrate the possible functionality.
[edit]
It would be handy (if a similar syntax is going to be used) to have constructs like if-else or the like.
Given tzlaine's lack of enthusiasm at my original suggestion, any sort of complicated logical structures are best avoided if at all possible, if there's to be any chance of any of this actually being implemented... Sometimes this might not be possible, but in cases where it is, simple syntax is probably how things will be done, even if it makes it cumbersome for translators to actually use the system... (as long as it works, that is)

User avatar
Ablaze
Creative Contributor
Posts: 314
Joined: Tue Aug 26, 2003 6:10 pm
Location: Amidst the Inferno.

#38 Post by Ablaze » Sun Feb 20, 2005 12:17 am

Geoff the Medio wrote:The programmer does not add flags. The set of flags is entirely determined by the translator, and the flags in one language have no impact on the flags in another, and there are no language independent flags. If some language has differeing grammatical constructions for mechanical nouns, then the translator for that language would add in whatever text for a flag they want. They can use "Mechanical" or "Mech" or "DX8245" or whatever makes sense to them in their language if they want to.
translator? You want every language that's supported by this game to be able to translate to and from every other language? Who is going to do all this work? What if no one volunteers who knows both French and Chinese? Now they can’t talk?
Time flies like the wind, fruit flies like bananas.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 12456
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#39 Post by Geoff the Medio » Sun Feb 20, 2005 12:34 am

Ablaze wrote:translator? You want every language that's supported by this game to be able to translate to and from every other language? Who is going to do all this work? What if no one volunteers who knows both French and Chinese? Now they can’t talk?
You still don't seem to understand what I'm talking about; this has nothing to do with allowing players of different languages to play together. This is about translating the game's stringtable file to other languages, and adding a system to allow the strings in any language, including english, to accept parameters and display the correctly using a language-independent system of flags and checks for flags.

The translator is the person who edits the stringtable. They only need to know the language they are translating to, and the language of a preexisting stringtable file, ideally english. The edit the stringtable file, replacing all the strings with ones for the language they are translating to.

There is nothing to do with the game translating between languages in this. The game just reads and renders as in-game text all of the stringtable entries, translated or otherwise.

User avatar
Ablaze
Creative Contributor
Posts: 314
Joined: Tue Aug 26, 2003 6:10 pm
Location: Amidst the Inferno.

#40 Post by Ablaze » Sun Feb 20, 2005 4:16 am

So what you're saying is that you are only considering the UI? Fine, string tables seem adequate if you really think they would be easier. Personally, I find substitution strings difficult to deal with, but I hear others like them.

I do think you need to consider how these extra tags are going to be generated. It could get quite complicated, and if everyone who defines a UI is going to have to do it you're going to want to make it as simple as possible.
Time flies like the wind, fruit flies like bananas.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 12456
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#41 Post by Geoff the Medio » Sun Feb 20, 2005 1:19 pm

Ablaze wrote:So what you're saying is that you are only considering the UI? Fine, string tables seem adequate if you really think they would be easier. Personally, I find substitution strings difficult to deal with, but I hear others like them.
I'm just trying to work with what's already been coded. The game is going to have a stringable system, regardless of what you or I think.

I've also not yet seen a viable alternative. (Language synthesis is not a viable alternative).
I do think you need to consider how these extra tags are going to be generated. It could get quite complicated, and if everyone who defines a UI is going to have to do it you're going to want to make it as simple as possible.
What do you mean by "generated" ? As I've said, the translator would add as many as needed by hand.

The way to keep things not complicated is to make a good set of strings in the table. This is a language-independent activity (or rather, it will be, as the set of string-names recognized by the game won't be alterable by translators). As long as the strings are basically complete sentences and isolated nouns to put into the otherwise complete sentences, then the complications should be minimal, and the tags required in most languages will be manageable.

Also, many of the UI stringtable entries won't need any sort of substitution to work... Most are single words used as labels in the UI. The complicated entries are for the AI diplomatic text, encyclopedia descriptions, sitrep entry text, event text / description / options, etc. that require some degree of mutability through substitutions.

User avatar
Daveybaby
Small Juggernaut
Posts: 724
Joined: Mon Sep 01, 2003 11:07 am
Location: Hastings, UK

#42 Post by Daveybaby » Tue Feb 22, 2005 8:47 am

You still don't seem to understand what I'm talking about; this has nothing to do with allowing players of different languages to play together.
... although interestingly the stringtable system could be used to allow players of different languages to play together. Okay, raw text chat wouldnt work (although even so i guess it might be possible to have the game go off to babelfish and get it transmangled there) but picking standard displomacy options should work across different languages just as well as it does with the AI.
The COW Project : You have a spy in your midst.

Burgundavia
Space Krill
Posts: 2
Joined: Fri Jun 10, 2005 3:04 pm

#43 Post by Burgundavia » Fri Jun 10, 2005 3:07 pm

The best string-based system to use is gettext:
http://www.gnu.org/software/gettext/

It is has the advantage of being able to hook into existing translation communities and get them to help you.

Corey

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 12456
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#44 Post by Geoff the Medio » Fri Jun 10, 2005 4:01 pm

Burgundavia wrote:The best string-based system to use is gettext:
http://www.gnu.org/software/gettext/
I'm having a hard time finding a simple and clear summary of how and what gettext does, but it looks like it just provides a way to replace marked strings in the sourcecode with new translated strings. This is no better, and probably more complicated, than the current external stringtable we're using now, and doesn't address the other issues discussed in this thread, related to needing to be able to modify strings in response to other strings that are inserted into or into which they are inserted.

User avatar
Yoghurt
Programmer
Posts: 376
Joined: Sat Jun 28, 2003 8:17 pm
Location: Heidelberg, Germany

#45 Post by Yoghurt » Fri Jun 10, 2005 7:11 pm

Geoff the Medio wrote:I'm having a hard time finding a simple and clear summary of how and what gettext does, but it looks like it just provides a way to replace marked strings in the sourcecode with new translated strings.
Well, gettext is like the default l10n system for Linux; for example, if you set your locale to de_DE, all the systemtools that support it as well as all other gettext programs will be in german. It has some features that our current system does not provide, and editors exist to easily create translations. It is also more programmer-friedly, as you can write

Code: Select all

printf(_("Game over")
and let gettext create a translation file template for you from that, instead of our system

Code: Select all

printf(translate(GAME_OVER_STRING))
. I always preferred gettext over our system, but right now we have different things to cope with.

Post Reply