Review save game Format

Programmers discuss here anything related to FreeOrion programming. Primarily for the developers to discuss.

Moderators: Committer, Committer

Post Reply
Message
Author
Peter524
Space Krill
Posts: 6
Joined: Thu Jan 03, 2019 12:27 am

Review save game Format

#1 Post by Peter524 » Sat Jan 05, 2019 6:00 pm

I would like to reopen the discussions on the save game format (https://www.freeorion.org/forum/viewtop ... lib#p66548) and here (https://www.freeorion.org/forum/viewtop ... f=9&t=4747) and possibly in other places as well.

As it stands, I believe the current xml+zlib+base64 compression method is suboptimal, for various reasons. Most importantly, it fails when it would be needed most: on large games.
In my case, I play with 400 to 800 star universes. With xml+zlib compression enabled the save game code (running on the server) will attempt to save xml+zlib. Due to quirks of the format, at first the xml serialization will have to be done to memory, to be included as base64 string in the "real"/wrapper savegame xml. At some point of the game, the memory serialization will start to fail every time. That's probably because it is not able to allocate a contiguous memory block of the required size. After this operation, which will typically take quite a few seconds, the code will resort to binary serialization, taking up another few seconds. There is no warning on this. When I started to compile my own version of FreeOrion, I was not ably to load these binary save games.

In my view, the whole xml+zlib+base64 approach is flawed, and should be replaced by the ideas discussed 8 years ago.
I would recommend saving the (original, uncompressed) xml format and the binary format in gzip compressed streams. For the xml format we could keep the current user configuration of saving uncompressed xml files.

This would have the following benefits:
- XML savegames would always work, regardless of configuration setting. There would be no need to silently and unexpectetly resort to binary serialization.
- There would be no out-of-memory condition on the server, leading to possibly other unexpected side effects, depending on what other threads are currently doing.
- Memory pressure on the server would be reduced, allowing to play larger games.
- There would be performance improvements, as we won't double-save binary saves.
- GZIP-compression would reduce xml files beyond what is currently possible ith xml+zlib+base64 (due to base64 blowing up the whole thing by 33.3%)
- GZIP-compression would reduce binary serialized file to 1/6 of their current size. E.g. instead of 100 MB per save game only 17 MB.

This would gave the following drawbacks:
- In one of the older discussions the question on external dependencies when using GZIP was raised. On my system (Windows) everything worked out of the box.

I have an implementation based on the current code available at https://github.com/OlafPettersson/freeo ... pSaveGames

On user configuration:
- There is an issue with user configuration. The Save Game-Options only take effect when FO is restarted. This is, I believe, due to the fact that the options are changed on the client, but will have to take effect on the server. I think this should I be either changed, so that the options take place immediately, or communicated clearly to the user.

User avatar
Vezzra
Release Manager, Design
Posts: 5002
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: Review save game Format

#2 Post by Vezzra » Sat Jan 05, 2019 6:36 pm

Peter524 wrote:
Sat Jan 05, 2019 6:00 pm
GZIP-compression would reduce xml files beyond what is currently possible ith xml+zlib+base64 (due to base64 blowing up the whole thing by 33.3%
I could be wrong, as I don't remember the discussion very well, but IIRC the base64 thing had been added to prevent issues with cross-platform compatibility of the zlib compressed serialized data.

So, if we remove the base64 encoding, the gzip compressed data needs to be cross-platform compatible.

Geoff, you might know better why we have that base64 encoding...?

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 12456
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Review save game Format

#3 Post by Geoff the Medio » Mon Jan 07, 2019 6:53 am

Part of the utility of XML wrapped compressed data save files is that the header information is immediately parseable / readable, without need to decompress the whole XML string. This means the metadata about a save can be read without reading in the whole save. If the whole thing is gzipped, then it would need to be unzipped to read that data. Alternatively, there would need to be a separate header file, which is also problematic.
Vezzra wrote:
Sat Jan 05, 2019 6:36 pm
I could be wrong, as I don't remember the discussion very well, but IIRC the base64 thing had been added to prevent issues with cross-platform compatibility of the zlib compressed serialized data.
The issue was that zipped data, when wrapped in XML tags, didn't produce valid XML files. By restricting the characters used with base64, the resulting zipped data can be wrapped in XML tags and produce a valid XML string / file. Any cross-platform issues were probably just that some implementations handled this better than others.

o01eg
Programmer
Posts: 481
Joined: Sat Dec 10, 2011 5:46 am

Re: Review save game Format

#4 Post by o01eg » Mon Jan 07, 2019 12:16 pm

Geoff the Medio wrote:
Mon Jan 07, 2019 6:53 am
Part of the utility of XML wrapped compressed data save files is that the header information is immediately parseable / readable, without need to decompress the whole XML string. This means the metadata about a save can be read without reading in the whole save. If the whole thing is gzipped, then it would need to be unzipped to read that data. Alternatively, there would need to be a separate header file, which is also problematic.
I suppose gzip doesn't require full decompression. What if read decompressed stream until header ends?
Gentoo Linux x64, gcc-8.3, boost-1.65.0
Ubuntu Server 18.04 x64, gcc-7.4, boost-1.65.1
Welcome to slow multiplayer game at freeorion-lt.dedyn.io. Version 2019-10-13.49f7896.
Donates are welcome: BTC:14XLekD9ifwqLtZX4iteepvbLQNYVG87zK

Peter524
Space Krill
Posts: 6
Joined: Thu Jan 03, 2019 12:27 am

Re: Review save game Format

#5 Post by Peter524 » Mon Jan 07, 2019 9:02 pm

o01eg wrote:
Mon Jan 07, 2019 12:16 pm
I suppose gzip doesn't require full decompression. What if read decompressed stream until header ends?
I second that. The test-implementation I am proposing does not uncompress the whole file when only reading metadata. It only uncompresses as much compressed blocks (of typically 32kb (https://www.gnu.org/software/gzip/manual/gzip.html)) as required.

From a code-standpoint, this is all very transparent. Upon decompression, we detect if this is a gzip encoded file by checking for the magic gzip header. If we find it, we just wrap the filestream with a gzip-decoder. The decoder will only decode as many bytes as requested from the deserialization code.

There is no overlap between the magic gzip header, the boost binary serialization format and the "<?xml"-marker, so detecting a gzip file is pretty safe.

Peter524
Space Krill
Posts: 6
Joined: Thu Jan 03, 2019 12:27 am

Re: Review save game Format

#6 Post by Peter524 » Mon Jan 07, 2019 9:34 pm

Geoff the Medio wrote:
Mon Jan 07, 2019 6:53 am
The issue was that zipped data, when wrapped in XML tags, didn't produce valid XML files. By restricting the characters used with base64, the resulting zipped data can be wrapped in XML tags and produce a valid XML string / file. Any cross-platform issues were probably just that some implementations handled this better than others.
Just to be clear: I not proposing to have an uncompressed header with a compressed sub-section. I'm proposing to just compress the whole XML as-is, in an open, standardized format. The compression/decompression will not happen on-the-fly, i.e. not the whole file be be compressed/decompressed at once, but small chunks, so called "blocks" at a time.

This has the slight drawback, that you can not open the file with a text-editor and make sense of it. But I believe anybody computer literate enough to want to edit the XML file to make changes to the savegame will also be able to uncompress the gzipped XML (or disable the compression if we leave that option in).

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 12456
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Review save game Format

#7 Post by Geoff the Medio » Tue Jan 08, 2019 8:35 am

If it works and is still fast when loading header info for a directory full of saves, sounds good.

o01eg
Programmer
Posts: 481
Joined: Sat Dec 10, 2011 5:46 am

Re: Review save game Format

#8 Post by o01eg » Tue Jan 08, 2019 8:41 am

I suppose main issue is how to make boost serialization read header part of full xml and to ensure header always writes first.
Gentoo Linux x64, gcc-8.3, boost-1.65.0
Ubuntu Server 18.04 x64, gcc-7.4, boost-1.65.1
Welcome to slow multiplayer game at freeorion-lt.dedyn.io. Version 2019-10-13.49f7896.
Donates are welcome: BTC:14XLekD9ifwqLtZX4iteepvbLQNYVG87zK

Peter524
Space Krill
Posts: 6
Joined: Thu Jan 03, 2019 12:27 am

Re: Review save game Format

#9 Post by Peter524 » Tue Jan 08, 2019 7:49 pm

Geoff the Medio wrote:
Tue Jan 08, 2019 8:35 am
If it works and is still fast when loading header info for a directory full of saves, sounds good.
Yep, it works and is still fast. I'm not sure whats the best way to prepare a test case. Maybe I could upload a large savegame somewhere - where? - , and whoever approves the PR could copy the file a hundred times and see that is is about as fast as with uncompressed xmls.

For me, the only question remains if we want to keep the option to not-compress xml saves. I believe, I read somewhere this option was mainly introduced because of problems with the previous compressed xml save game format. If this was true, we might want to drop the option.

I myself do not have a strong opinion either way, but would prepare the PR accordingly.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 12456
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Review save game Format

#10 Post by Geoff the Medio » Wed Jan 09, 2019 1:49 pm

Peter524 wrote:
Tue Jan 08, 2019 7:49 pm
For me, the only question remains if we want to keep the option to not-compress xml saves. I believe, I read somewhere this option was mainly introduced because of problems with the previous compressed xml save game format. If this was true, we might want to drop the option.
It's also useful to be able to inspect the contents of a save and to edit its contents. If the gzipped file can be unzipped in an external program, edited, rezipped, and then read by FreeOrion, that's sufficient.

Post Reply