Server crash on exception in r3703

Questions, problems and discussion about compiling FreeOrion.

Moderator: Oberlus

Message
Author
q1w2e3r4
Space Floater
Posts: 22
Joined: Sun Sep 26, 2010 11:05 pm

Re: Server crash on exception in r3703

#16 Post by q1w2e3r4 »

There is a good chunk of special case code for System / StarSystem serialization that was written to resolve other serialization problems
I removed that in my patch. I had tested removing it without the patch first, and it still didn't work.
Also, can you have a try at removing some of the System members
Could you please guide me a bit here? I am still getting used to C++, I come from a mostly scripting and some JAVA background. What do you consider 'members' in the class?

Thanks,
q

P.S. was StarSystem an ok name to use? (not important...)

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Server crash on exception in r3703

#17 Post by Geoff the Medio »

q1w2e3r4 wrote:What do you consider 'members' in the class?
"Members" in this case refers to member variables. In the case of System, stripping comments:

Code: Select all

StarType m_star;
int m_orbits;
ObjectMultimap m_objects;
StarlaneMap m_starlanes_wormholes;
as well as the various members inherited from UniverseObject... (But UniverseObject and its various derived classes other than System don't have this serialization problem, so I'm inclined to focus on what about System specifically makes the serialization fail.)

If it's unclear, StarType is an enum, and ObjectMultimap and StarlaneMap are typedefs made earlier in System.h. Or just replacing the typedefs with their underlying type... they're not so complicated that the typedefs are really necessary.

The fix might be a simple as making ObjectMultimap's definition public earlier in the file...
P.S. was StarSystem an ok name to use? (not important...)
Should be fine. I don't imaging there's much chance of that conflicting with something else using the same name (whereas it's plausible with "System").

q1w2e3r4
Space Floater
Posts: 22
Joined: Sun Sep 26, 2010 11:05 pm

Re: Server crash on exception in r3703

#18 Post by q1w2e3r4 »

Well I added the debugging code mentioned in earlier posts, but it isn't printing out anything. Also after adding:

Code: Select all

if (const System* system = universe_object_cast<const System*>(it->second))
            continue;
to ObjectMap::CompleteCopyVisible in Universe.cpp, I get this additional compiler warning:

Code: Select all

./freeorion-0.3.15/universe/Universe.cpp: In member function ‘void ObjectMap::CompleteCopyVisible(const ObjectMap&, int)’:                               
./freeorion-0.3.15/universe/Universe.cpp:321: warning: unused variable ‘system’
I don't know why the extra logging isn't working.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Server crash on exception in r3703

#19 Post by Geoff the Medio »

q1w2e3r4 wrote:Well I added the debugging code mentioned in earlier posts, but it isn't printing out anything.
Could you be more specific?
Also after adding:

Code: Select all

if (const System* system = universe_object_cast<const System*>(it->second))
            continue;
to ObjectMap::CompleteCopyVisible in Universe.cpp, I get this additional compiler warning:

Code: Select all

./freeorion-0.3.15/universe/Universe.cpp: In member function ‘void ObjectMap::CompleteCopyVisible(const ObjectMap&, int)’:                               
./freeorion-0.3.15/universe/Universe.cpp:321: warning: unused variable ‘system’
That's not surpsing. It's creating a varible system in the if statement and assigning it a value, but not using it. It could be rewritten to avoid the warning:

Code: Select all

if (universe_object_cast<const System*>(it->second))
Have you tried making public the typedef of ObjectMultimap near the top of class System / StarSystem?

q1w2e3r4
Space Floater
Posts: 22
Joined: Sun Sep 26, 2010 11:05 pm

Re: Server crash on exception in r3703

#20 Post by q1w2e3r4 »

Have you tried making public the typedef of ObjectMultimap near the top of class System / StarSystem?
yes, to no effect.

I am currently removing the class members as you suggest - struggling a bit because of my unfamiliarity with the language, though, but I'll get there eventually.

q1w2e3r4
Space Floater
Posts: 22
Joined: Sun Sep 26, 2010 11:05 pm

Re: Server crash on exception in r3703

#21 Post by q1w2e3r4 »

Could you be more specific?
Yes, sorry. The only output I get is:

Code: Select all

main() caught exception(std::exception): unregistered class
which leads to me do believe debugging is not on somehow. And that is why I am not seeing the other serializing messages from the code posted earlier in this thread.

Is debugging not enabled in the CMAKE scripts? I have been using:

Code: Select all

-DBUILD_DEBUG=1
q

q1w2e3r4
Space Floater
Posts: 22
Joined: Sun Sep 26, 2010 11:05 pm

Re: Server crash on exception in r3703

#22 Post by q1w2e3r4 »

Huh. Something odd just happened. I decided to compile with no extra flags (like CXX flags -Dno-strict-aliasing), my steps:

Code: Select all

patch -p0 < ../debug_serialization.patch
cmake -DBUILD_DEBUG=1 .
make -j8
sed -i 's#=\.#=/usr/lib64/OGRE#' ogre_plugins.cfg
(attached is the basic patch I used. the 'sed' step was just to point to the proper OGRE plugins)

and the universe was serialized just fine. Though it promptly kicked me off the server saying the AIs disconnected.

I must try and duplicate now.
Attachments

[The extension patch has been deactivated and can no longer be displayed.]


User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Server crash on exception in r3703

#23 Post by Geoff the Medio »

q1w2e3r4 wrote:
Could you be more specific?
Yes, sorry. The only output I get is:

Code: Select all

main() caught exception(std::exception): unregistered class
which leads to me do believe debugging is not on somehow.
I meant could you be more specific about what debugging code you added. Why are you expecting extra output?
And that is why I am not seeing the other serializing messages from the code posted earlier in this thread.
What code specifically?
Though it promptly kicked me off the server saying the AIs disconnected.
You might consider starting a game with 0 AI players.

q1w2e3r4
Space Floater
Posts: 22
Joined: Sun Sep 26, 2010 11:05 pm

Re: Server crash on exception in r3703

#24 Post by q1w2e3r4 »

The debugging code I used is the code you suggested that toadicus apply in th 2nd, 4th, and 6th posts of this thread. You can see the exact code I applied from the patch I attached in my last post, debug_serialization.patch.

I just did about 30 - 40 freeorion compiles to see what breaks it compiling, here are my results:

I got freeorion to continue past serialization and into the game with the bare minimum cmake options of:

Code: Select all

cmake . -DBUILD_DEBUG=1
The following cmake adjustments will make freeorion crash with the 'unregistered class' error with respect to the base, above:

1. using -DBUILD_DEBUG=0 instead. I notice that all this effectively does is add -DNDEBUG to the compiler
2. remove the boost hack code from SerializeUniverseExports.ipp [and put BOOST_CLASS_EXPORT(System)] in its place
3. refactoring System to StarSystem and removing the boost hack code in SerializeUniverseExports.ipp
4. gcc optimize flags (-O1, -O2, -O3, -Os) added with -DCMAKE_CXX_FLAGS:STRING="<someflagshere>"

More information on my optimization tests. Might be useful. Might not be.

Most of my time compiling was spent narrowing down the gcc flags. I started with a base set, set by my distribution for RPM packaging:

Code: Select all

-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -fno-strict-aliasing
I narrowed it down to -O2. and then found the other optimizations would fail. -O0 worked just fine.

I decided to do a bisect test of all the -O1 optimizations returned by:

Code: Select all

c++ -Q -O1 --help=optimizers |grep enabled
The following flags were returned:

Code: Select all

-falign-loops -fargument-alias -fasynchronous-unwind-tables -fbranch-count-reg -fcommon -fcprop-registers -fdce -fdefer-pop -fdse -fearly-inlining -fgcse-lm -fguess-branch-probability -fif-conversion -fif-conversion2 -finline-functions-called-once -fipa-pure-const -fipa-reference -fivopts -fjump-tables -fmath-errno -fmerge-constants -fmove-loop-invariants -fomit-frame-pointer -fpeephole -frename-registers -fsched-interblock -fsched-spec -fsched-stalled-insns-dep -fsigned-zeros -fsplit-ivs-in-unroller -fsplit-wide-types -ftoplevel-reorder -ftrapping-math -ftree-ccp -ftree-ch -ftree-copy-prop -ftree-copyrename -ftree-cselim -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-fre -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-sra -ftree-switch-conversion -ftree-ter -ftree-vect-loop-version -funit-at-a-time -fvar-tracking -fvect-cost-model -fweb
I then added all of these to cmake with -DCMAKE_CXX_FLAGS:STRING= and compiled. The funny thing is: freeorion ran without crashing. Which I don't understand. Is there some difference between the addition of these flags and "-O1".

These are my findings so far. I was hoping to narrow it down to a few flags, but my last test left me confused.

I hope some of this is useful.

q

q1w2e3r4
Space Floater
Posts: 22
Joined: Sun Sep 26, 2010 11:05 pm

Re: Server crash on exception in r3703

#25 Post by q1w2e3r4 »

Further success: I decided to be a bit more tenacious. I found that if you compile with the flags:

Code: Select all

-S -fverbose-asm
you can see the preassembled code after just the compilation. This will tell you all the gcc options that were evaluated and finally used. I found one difference between -O1 or -O0 with tons of other flags:

Code: Select all

-finline
This flag seems to only be turned on with -O1, I couldn't get it to turn on with -O0. So I decided to compile with:

Code: Select all

cmake . -DBUILD_DEBUG=1 -DCMAKE_CXX_FLAGS:STRING="-fno-inline -O1"
Freeorion didn't crash!
Compiling with -O2 worked as well. So optimization works only if you disable inlining.

As far as why inlining may be causing problems with boost serialization, I came across this invalid gcc bug:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38828

Some of the responses to the reporter of the bug suggested code changes that I am not quite sure how to translate in C++. It looked it might be relevant.

q

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Server crash on exception in r3703

#26 Post by Geoff the Medio »

I don't know if the issues discussed in that bug thread are relevant to the "unregistered class" problem... they appear to be issues with compiling or linking, which isn't the problem for FreeOrion in this case.

That said, I suppose it's worth a try. You could try adding this before or after the BOOST_CLASS_EXPORT(System) line:

Code: Select all

namespace boost { namespace serialization {
    template void
    serialize<FREEORION_IARCHIVE_TYPE>(FREEORION_IARCHIVE_TYPE&, System&, unsigned int);

    template void
    serialize<FREEORION_OARCHIVE_TYPE>(FREEORION_OARCHIVE_TYPE&, System&, unsigned int);
}}
I added it to the top of SerializeUniverseExports.ipp (just after the #include lines) and the SerializeUniverse.cpp (which includes that .ipp) still compiled, so there's a chance it could still compile in Linux as well...

q1w2e3r4
Space Floater
Posts: 22
Joined: Sun Sep 26, 2010 11:05 pm

Re: Server crash on exception in r3703

#27 Post by q1w2e3r4 »

Adding the code in your last post while keeping compiler optimizations fully on did not work. It did compile fine, but the unregistered class error still came up. Oh well. It was just a shot in the dark.

On a side note, would you know why that serialization debuging code mentioned earlier isn't printing out?

I am using -DBUILD_DEBUG=1 with cmake and I installed the debugging symbols. I use 'gdb ./freeorion' but I still get no logging on which objects are being serialized.

q

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Server crash on exception in r3703

#28 Post by Geoff the Medio »

q1w2e3r4 wrote:On a side note, would you know why that serialization debuging code mentioned earlier isn't printing out?
The first objects created during universe generation are systems, so a system has object id 0. When serializing, objects are encoded in order of object id, so the first object serialized is a system. If the serialization crashes before executing the first object's serialize function, none of the debug output code in serialize functions would be executed.

Have you tried removing all the data members of System, and changing System::serialize to just be:

Code: Select all

ar  & BOOST_SERIALIZATION_BASE_OBJECT_NVP(UniverseObject);
?

q1w2e3r4
Space Floater
Posts: 22
Joined: Sun Sep 26, 2010 11:05 pm

Re: Server crash on exception in r3703

#29 Post by q1w2e3r4 »

I was able to get these two removed:

Code: Select all

m_star
m_orbits
The game compiled and still crashed, though. The other two I am still struggling with.

I also added printf statements at the start of each serialize() method in SerializeUniverse.cpp:

Code: Select all

void System::serialize(Archive& ar, const unsigned int version)
 {
    printf("Serialize System\n");
    ar  & BOOST_SERIALIZATION_BASE_OBJECT_NVP(UniverseObject)
        & BOOST_SERIALIZATION_NVP(m_star)
        & BOOST_SERIALIZATION_NVP(m_orbits)
The game crashes before even hitting this print statement - wouldn't this mean that the crash is before the actual serializing of System starts? (My ignorance of C++ is probably showing here.)

q

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Server crash on exception in r3703

#30 Post by Geoff the Medio »

q1w2e3r4 wrote:I was able to get these two removed:

Code: Select all

m_star
m_orbits
The game compiled and still crashed, though. The other two I am still struggling with.
Just start with removing them from the serialize function, as suggested above. There being a few extra non-serialized data members in the class shouldn't matter until after serialization is completed (where it might crash because the data makes no sense with its default values, but that should look different from the unregistered class error).
I also added printf statements at the start of each serialize() method in SerializeUniverse.cpp:

Code: Select all

void System::serialize(Archive& ar, const unsigned int version)
 {
    printf("Serialize System\n");
    ar  & BOOST_SERIALIZATION_BASE_OBJECT_NVP(UniverseObject)
        & BOOST_SERIALIZATION_NVP(m_star)
        & BOOST_SERIALIZATION_NVP(m_orbits)
The game crashes before even hitting this print statement - wouldn't this mean that the crash is before the actual serializing of System starts? (My ignorance of C++ is probably showing here.)
Printf (or std::cout which you should use instead in C++ code) will never output anything when executed on the server, unless you're running the server in a console window, and not just letting the server be spawned by the client when starting a game. Logger().debugStream() outputs to the log file, which will always be written.

However, the conclusion is right: the System::serialize isn't being executed. That's the whole problem. For some reason the boost serialization code is deciding that the system, or something in it, isn't "registered" for serializing, and thus doesn't both to call System::serialize, and instead throws an exception. For some reason this is happening despite the calls to BOOST_CLASS_EXPORT(System) just like for all the other UniverseObject-derived classes that don't have this problem.

By suggesting removing bits of System::serialize or the System class itself, I'm trying to figure out what bit of System is causing System to be not registered when all the other UniverseObject-derived classes are registered by seemingly equivalent code.

Post Reply