Font support for Central European languages...

For topics that do not fit in another sub-forum.

Moderator: Oberlus

Message
Author
stride
Space Floater
Posts: 31
Joined: Sun Sep 28, 2008 3:05 pm
Location: Horten, Norway

Re: Font support for Central European languages...

#16 Post by stride »

I've taken a quick look at adding UTF-8 support.

One approach seems to be adding it to the stringtable and GG elements only, which might be sufficient.


I spent some time googling around. Here is what I found:

Freetype lib in use supports Unicode, and once "glyphs" have been rendered into bitmaps what remains is "getting to them". Current code loads the ASCII charset range of characters from the font. The font distributed with FreeOrion has Unicode chars in it, I guess it's the loading and "getting to them" part that is the challenge.

A UTF-8 implementation needs to be able to load a more extensive set of glyphs. Loading all characters from the font might not be advisable as it can be quite huge. One approach could be to render glyph bitmaps "on demand" from the font file. Font file residing on disk or in memory if it's way to slow. Rendered bitmaps could then be added to the initial set, which would end up having only characters needed.

Also it looks like some sort of "character maps" is needed to decode the "UTF-8 bytes" representing a character into the correct bitmap where it is rendered.


If you can add some insight or contribute to this post in any way, please do so. We might be able to get this started.

I found these two helpful:
ftp://ftp.isi.edu/in-notes/rfc3629.txt
http://www.joelonsoftware.com/articles/Unicode.html
http://www.research.att.com/~bs/3rd_loc.pdf
http://www.tru64unix.compaq.com/cplus/intzln.pdf


Best regards,

M.

User avatar
Cyber Killer
Space Floater
Posts: 20
Joined: Sun Oct 05, 2008 7:53 am
Location: Koszalin, Poland

Re: Font support for Central European languages...

#17 Post by Cyber Killer »

After a quick look @ the code I found that all the strings are displayed through the UserString() function (correct me if I'm wrong). I still think that adding UTF8 support is easier than it looks (this should be able to do on a library level), so now I'm looking for any documentation (or at least a definition or a prototype) of this function or class. I guess this is not a standard C++ function, so if any one can point me to the right place to learn sth about this funtcion I would be grateful.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Font support for Central European languages...

#18 Post by Geoff the Medio »

I'm not sure what you're asking exactly... Finding the declaration and definition of a function shouldn't be difficult if you're using a decent IDE, but in case that's what you want, UserString is declared in MultiplayerCommon.h and defined in MultiplayerCommon.cpp which are in the util directory. The definition just wraps the StringTable class's String lookup method, though... The StringTable class is in StringTable.h and StringTable.cpp which are in the UI directory.

The StringTable class is mostly a map from internal sting code name to user-readable strings, both stored as std::sting, and various bits of helper code to load the stringtable from disk.

User avatar
Cyber Killer
Space Floater
Posts: 20
Joined: Sun Oct 05, 2008 7:53 am
Location: Koszalin, Poland

Re: Font support for Central European languages...

#19 Post by Cyber Killer »

I'm not using any IDE ;-P. I usually write in Kate (a Linux text editor, which has a really nice syntax highlighting and easy access to the terminal ;-) ) which is enough for my usual programming needs ;-). (On the other hand I'm not writing any large programs. ;-) )

Anyway thx, I'll look into this further.

tzlaine
Programming Lead Emeritus
Posts: 1092
Joined: Thu Jun 26, 2003 1:33 pm

Re: Font support for Central European languages...

#20 Post by tzlaine »

@ stride : Sorry about the long response time. I became a father on Oct. 2, so haven't had much (or any) free time.

Yes, You're on the right track. The real stumbling block is the GG's (our GUI library) representation of font glyphs. As you note, loading all the UTF-8 glyphs is not possible, nor really necessary. The solution is probably some on-demand system like you suggest, of perhaps some hint as to the required glyphs (e.g. a hint that says "Just give me Cyrillic."). If you fix this problem, the rest is really pretty straightforward with the aid of a UTF-8 library like utfcpp ( http://utfcpp.sourceforge.net ).

stride
Space Floater
Posts: 31
Joined: Sun Sep 28, 2008 3:05 pm
Location: Horten, Norway

Re: Font support for Central European languages...

#21 Post by stride »

Gratulations! Kids are great :)

I spent some more time looking into the FreeOrion and FreeType code, and it appears we can leave the StringTable code as is. There is little to gain from making the stringtable keys UTF8-safe, and leaving the datatype as std::string will still hold the UTF8 byte sequences for the more "complex" locale character representations.

It also appears that the font file holds 2-3 different character-mapping tables related to platform-id etc, from which freetype loads something appropriate by default. charcode in FreeType is represented as FT_ULong, which is UCS-2, and Little/Big Endian variants exist. UCS-2 is pretty close to UTF-16 or something like that: http://unicode.org/faq/utf_bom.html

With luck, the platform-id and default character mapping might resolve all above problems, and we might get away with leaving it "as is". Font has close to 1000 different character entries btw, possibly it's for Western European locales. Weird locales might just need another font.


Here is my proposed changes:
- "Range" stuff in Font::Init goes out, if it is not useful.
- We introduce a Character object, held by a std::map<FT_ULong charcode, object* Character> table inside font object.
- Character object is made responsible for loading and prerendering its glyph, and also keep relevant data like height, width etc.

- CharToFT_Ulong:
This is where we can fix things. Currently it converts from, well..., ASCII charcode to FT_Ulong with some nasty casts. As of now the std::strings that is to be rendered is pulled apart, char by char, converted to FT_ULong by using lCharToFTULong - then rendered.
We need to rewrite all calls to this function. UCS-2 encode the relevant std::strings being used, and then render with the appropriate charcode instead.


That's about it. Not a whole lot of rewrite. I'm out on thin ice and making a lot of assumptions, so stuff have to be tested first. I'm also assuming we leave keyboard input etc in dialogs as ASCII :)

User avatar
Cyber Killer
Space Floater
Posts: 20
Joined: Sun Oct 05, 2008 7:53 am
Location: Koszalin, Poland

Re: Font support for Central European languages...

#22 Post by Cyber Killer »

two things... the currently used font (DejaVu_etcetera) has support for eastern european languages and many many more; the keyboard input could be kept in ASCII (would help a lot with filenames etc), but I think it's safe to give utf support for names and ingame chat

User avatar
Cyber Killer
Space Floater
Posts: 20
Joined: Sun Oct 05, 2008 7:53 am
Location: Koszalin, Poland

Re: Font support for Central European languages...

#23 Post by Cyber Killer »

I've come to see that this issue is too much for my programing skills and my spare time ;-(. I'm going back to my translation...

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Font support for Central European languages...

#24 Post by Geoff the Medio »

There are plenty of other programming tasks available, some of which should be less involved. We have a much greater need for programmers than translators at the moment... (although the latter is also appreciated).

User avatar
Cyber Killer
Space Floater
Posts: 20
Joined: Sun Oct 05, 2008 7:53 am
Location: Koszalin, Poland

Re: Font support for Central European languages...

#25 Post by Cyber Killer »

OK, I'll take a look @ the todo list, but I'm not promising anything ;-).

Post Reply