Late game (after turn 200) server disconnects

Describe your experience with the latest version of FreeOrion to help us improve it.

Moderator: Oberlus

Forum rules
Always mention the exact version of FreeOrion you are testing.

When reporting an issue regarding the AI, if possible provide the relevant AI log file and a save game file that demonstrates the issue.
Message
Author
User avatar
mem359
Dyson Forest
Posts: 214
Joined: Sun Jun 08, 2014 1:18 am

Late game (after turn 200) server disconnects

#1 Post by mem359 »

I've been noticing that most of my games have crashed with a server disconnect (exit to main menu). It has been happening late, after Experimenters have shown themselves. I was having problems replicating it, but I can finally repeat it.

For the particular save game I am using, it is turn 208.
After loading the game, it takes a while before the screen refreshes, the galaxy is visible, and the turn button shows up.

If I click the "turn" button in the next 6 seconds or less, the game stops with a server disconnect.
If I wait 10 seconds or more, the turn processes normally and the game doesn't crash. (!!!)
(Is there some process running in the background that is still active even after the new turn starts?)

I compared the log files for the "normal" turn and the "bad" server disconnect.

"bad" freeorion.log (what seems to be the appropriate part)

Code: Select all

2017-03-15 21:51:34.664377 [debug] Client : ScopedTimer.cpp:28 : MapWnd::MidTurnUpdate time:       80 ms
2017-03-15 21:51:34.804525 [debug] Client : HumanClientFSM.cpp:483 : (HumanClientFSM) PlayingGame.TurnProgress
2017-03-15 21:51:34.848995 [debug] Client : HumanClientFSM.cpp:483 : (HumanClientFSM) PlayingGame.TurnProgress
2017-03-15 21:51:36.499556 [debug] Client : ClientNetworking.cpp:315 : ClientNetworking::NetworkingThread() : Networking thread will be terminated due to disconnect exception "End of file"
2017-03-15 21:51:36.505796 [debug] Client : HumanClientApp.cpp:1212 : HumanClientApp::DisconnectedFromServer
"bad" freeoriond.log (these are the last lines of the file)

Code: Select all

2017-03-15 21:51:35.060228 [debug] Server : LoggingWrapper.cpp:23 : Executing turn events for turn 208
2017-03-15 21:51:35.274204 [debug] Server : Universe.cpp:1380 : Issue times: planet species: 0.612 ship species: 0.262 specials: 2.634 techs: 14.417 buildings: 2.621 hulls/parts: 7.715 fields: 0.773
2017-03-15 21:51:35.274310 [debug] Server : Universe.cpp:1387 : Evaluation time: 356.029 reorder time: 1.95
2017-03-15 21:51:35.299018 [debug] Server : ScopedTimer.cpp:28 : Universe::ExecuteEffects time:       18 ms
"normal" freeorion.log (at a similar progress state)

Code: Select all

2017-03-15 21:48:30.896341 [debug] Client : ScopedTimer.cpp:28 : MapWnd::MidTurnUpdate time:       65 ms
2017-03-15 21:48:31.027754 [debug] Client : HumanClientFSM.cpp:483 : (HumanClientFSM) PlayingGame.TurnProgress
2017-03-15 21:48:31.075563 [debug] Client : HumanClientFSM.cpp:483 : (HumanClientFSM) PlayingGame.TurnProgress
2017-03-15 21:48:31.613306 [debug] Client : HumanClientFSM.cpp:483 : (HumanClientFSM) PlayingGame.TurnProgress
2017-03-15 21:48:31.910730 [debug] Client : HumanClientFSM.cpp:599 : (HumanClientFSM) PlayingGame.TurnUpdate
2017-03-15 21:48:31.969479 [debug] Client : SerializeUniverse.cpp:70 : Universe::serialize : deserializing universe width: 1063.81
"normal" freeoriond.log (at a similar progress state)

Code: Select all

2017-03-15 21:48:31.263431 [debug] Server : LoggingWrapper.cpp:23 : Executing turn events for turn 208
2017-03-15 21:48:31.479152 [debug] Server : Universe.cpp:1380 : Issue times: planet species: 0.49 ship species: 0.214 specials: 3.257 techs: 13.436 buildings: 2.08 hulls/parts: 8.121 fields: 0.711
2017-03-15 21:48:31.479255 [debug] Server : Universe.cpp:1387 : Evaluation time: 362.136 reorder time: 2.252
2017-03-15 21:48:31.504574 [debug] Server : ScopedTimer.cpp:28 : Universe::ExecuteEffects time:       19 ms
2017-03-15 21:48:31.552417 [debug] Server : ServerApp.cpp:3060 : ServerApp::PostCombatProcessTurns Turn number incremented to 209
I'm hoping someone more knowledgeable has ideas of what would cause the server to stop like that (not getting to the PostCombatProcessTurns) or cause an "End of File" exception. Or why it would depend on how quickly the turn button is pressed.

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: Late game (after turn 200) server disconnects

#2 Post by Dilvish »

could you post your savegame file someplace, so that we could test it out?
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
mem359
Dyson Forest
Posts: 214
Joined: Sun Jun 08, 2014 1:18 am

Re: Late game (after turn 200) server disconnects

#3 Post by mem359 »

Checking if I can attach a 2.3 MB gzip-tar save file
(unzips to 2.7 MB)

I'm using MacOSX. If I need the save file to use the FO binary format instead of the standard option, please let me know.
Attachments
turn208sav.tar.gz
(2.22 MiB) Downloaded 94 times

User avatar
mem359
Dyson Forest
Posts: 214
Joined: Sun Jun 08, 2014 1:18 am

Re: Late game (after turn 200) server disconnects

#4 Post by mem359 »

Another save file, this one using the most recent official test build (2017-03-20.60b8488).
It has the virtue (sin?) of being consistent, causing an EOF termination no matter how long I wait before pressing the next turn button.

freeorion.log (just the part when the game abruptly ends)

Code: Select all

2017-03-23 09:13:18.810890 [debug] Client : ScopedTimer.cpp:28 : MapWnd::InitTurnRendering time:       84 ms
2017-03-23 09:13:18.811199 [debug] Client : ScopedTimer.cpp:28 : MapWnd::MidTurnUpdate time:       85 ms
2017-03-23 09:13:19.788711 [debug] Client : ClientNetworking.cpp:354 : Client connection disconnected by EOF from server.
2017-03-23 09:13:19.796627 [debug] Client : HumanClientApp.cpp:1167 : HumanClientApp::DisconnectedFromServer
freeoriond.log (last lines of the file)

Code: Select all

2017-03-23 09:13:19.345703 [debug] Server : LoggingWrapper.cpp:23 : Executing turn events for turn 210
2017-03-23 09:13:19.550479 [debug] Server : Universe.cpp:1380 : Issue times: planet species: 0.2 ship species: 0.273 specials: 3.168 techs: 13.468 buildings: 2.005 hulls/parts: 8.138 fields: 1.013
2017-03-23 09:13:19.550584 [debug] Server : Universe.cpp:1387 : Evaluation time: 480.403 reorder time: 2.093
2017-03-23 09:13:19.582132 [debug] Server : ScopedTimer.cpp:28 : Universe::ExecuteEffects time:       25 ms
An earlier turn save (where the EOF does not happen) shows that the freeorion.log normally has more lines, like:
  • HumanClientFSM.cpp:670 : (HumanClientFSM) PlayingGame.TurnUpdate
    Client : SerializeUniverse.cpp:70 : Universe::serialize : deserializing universe width: 1100.55
    Client : SerializeUniverse.cpp:140 : Universe::serialize done
    Client : ScopedTimer.cpp:28 : Turn Update Unpacking time: 181 ms
    Client : HumanClientFSM.cpp:683 : Extracted TurnUpdate message for turn: 207
...while the freeoriond.log normally proceeds to something like:
Server : ServerApp.cpp:3056 : ServerApp::PostCombatProcessTurns Turn number incremented to 207
Attachments
turn210sav.tar.gz
(2.24 MiB) Downloaded 93 times

LGM-Doyle
Programmer
Posts: 219
Joined: Mon Feb 29, 2016 8:37 pm

Re: Late game (after turn 200) server disconnects

#5 Post by LGM-Doyle »

mem359, I was unable to duplicate the crash with either the old save or the new save.

Could you post the complete logs (freeorion.log, freeoriond.log and the 6 AI_x.log) from a game where you load the reliable crash savegame and let it crash?

User avatar
mem359
Dyson Forest
Posts: 214
Joined: Sun Jun 08, 2014 1:18 am

Re: Late game (after turn 200) server disconnects

#6 Post by mem359 »

LGM-Doyle wrote:mem359, I was unable to duplicate the crash with either the old save or the new save.

Could you post the complete logs (freeorion.log, freeoriond.log and the 6 AI_x.log) from a game where you load the reliable crash savegame and let it crash?
Thanks for testing it.

I was worried about that. The problem might be with the MacOSX build only (subset of users) or some interaction with other software on my computer (me only).

Unless a better suggestion comes up, guess the next step is to try a game that doesn't have Experimenters, and see if the turn-210ish problem is related to them (or happens anyways).
Attachments
logs210.tar.gz
(170.25 KiB) Downloaded 105 times

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: Late game (after turn 200) server disconnects

#7 Post by Vezzra »

I've been able to reproduce this on my Mac with the latest provided savegame. Log files attached.
Attachments
logfiles.zip
(174.62 KiB) Downloaded 108 times

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: Late game (after turn 200) server disconnects

#8 Post by Vezzra »

I cannot reproduce this on Windows and Linux. Which means that this is a Mac specific issue.

Another observation I made: When clicking "Load Game", the load game dialog box does not pop up immediately, there is a noticeable lag, which apparently gets longer to more savegames it has to read in, and the bigger those savegames are. I don't get this lag on Windows and Linux. Maybe related...?

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Late game (after turn 200) server disconnects

#9 Post by Geoff the Medio »

Vezzra wrote:Another observation I made: When clicking "Load Game", the load game dialog box does not pop up immediately, there is a noticeable lag, which apparently gets longer to more savegames it has to read in, and the bigger those savegames are. I don't get this lag on Windows and Linux. Maybe related...?
Are the save files in these cases of similar sizes? Are they saved in the same format?

LGM-Doyle
Programmer
Posts: 219
Joined: Mon Feb 29, 2016 8:37 pm

Re: Late game (after turn 200) server disconnects

#10 Post by LGM-Doyle »

Vezzra thanks for the log files.

The problem appears to be in the server but I can not tell what the problem is.

Could you run whatever debugger that you use and post a backtrace of the server process crashing?

User avatar
mem359
Dyson Forest
Posts: 214
Joined: Sun Jun 08, 2014 1:18 am

Re: Late game (after turn 200) server disconnects

#11 Post by mem359 »

I suspect that this has something to do with the Experimentors.
I've tried 3 games, with the same random number seed and Galaxy setup.
What changed were the "scripting" files that affected the Experimenters.

1) Modified monster_fleets.inf so that spawnrate=10.0 and Not WithinStarlaneJumps jumps = 4
For the game that had Experimentors in the galaxy, the EOF server failure happened on turn 210, a short while after the Experimenters were spawning. (This was the 2nd or 3rd time that the Sitrep warned about Black Krackens being seen.)

2) Modified monster_fleets.inf so that spawnrate=10000.0 and Not WithinStarlaneJumps jumps = 16
For the galaxy that did not have Experimentors, I have reached turn 270 without the EOF problem (and still going).

3) Brought back the Experimentors, and changed the EXPERIMENTOR_OUTPOST.focs.txt file, so that the EXPERIMENTOR_SPAWN_BASE_TURN is now 100 instead of 200
Got the EOF problem on turn 107. (I'm far away from the Experimentor outpost, but they are probably active.)

EDIT:
4) Changed the Exp base turn to 20.
The game actually "quit unexpectedly" on turn 22, with a EXC_BAD_ACCESS (SIGSEGV).
Restarted from the turn 20 save game, and just hit next turn (instead of "playing"). Got the server disconnect on turn 40.
Last edited by mem359 on Fri Mar 24, 2017 12:51 pm, edited 2 times in total.

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: Late game (after turn 200) server disconnects

#12 Post by Vezzra »

LGM-Doyle wrote:Could you run whatever debugger that you use and post a backtrace of the server process crashing?
Here we go. In the debugger I got these lines the moment the server crashed:

Code: Select all

Process 1298 stopped
* thread #1: tid = 0xdd4e, 0x00be64ca freeorioncommon.dylib`System::Copy(std::__1::shared_ptr<UniverseObject const>, int) + 1828, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x622d3233)
    frame #0: 0x00be64ca freeorioncommon.dylib`System::Copy(std::__1::shared_ptr<UniverseObject const>, int) + 1828
freeorioncommon.dylib`System::Copy:
->  0xbe64ca <+1828>: cmpl   %eax, (%ebx)
    0xbe64cc <+1830>: movl   %ebx, %eax
    0xbe64ce <+1832>: jne    0xbe64c7                  ; <+1825>
    0xbe64d0 <+1834>: cmpl   %esi, %ebx
Backtrace:

Code: Select all

* thread #1: tid = 0xdd4e, 0x00be64ca freeorioncommon.dylib`System::Copy(std::__1::shared_ptr<UniverseObject const>, int) + 1828, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x622d3233)
  * frame #0: 0x00be64ca freeorioncommon.dylib`System::Copy(std::__1::shared_ptr<UniverseObject const>, int) + 1828
    frame #1: 0x00bfcc95 freeorioncommon.dylib`Universe::UpdateEmpireLatestKnownObjectsAndVisibilityTurns() + 533
    frame #2: 0x0009bfe0 freeoriond`ServerApp::PostCombatProcessTurns() + 8780
    frame #3: 0x00153179 freeoriond`ProcessingTurn::react(ProcessTurn const&) + 473
    frame #4: 0x00157928 freeoriond`boost::statechart::detail::reaction_result boost::statechart::custom_reaction<ProcessTurn>::react<ProcessingTurn, boost::statechart::event_base, void const*>(ProcessingTurn&, boost::statechart::event_base const&, void const* const&) + 90
    frame #5: 0x00157891 freeoriond`boost::statechart::detail::reaction_result boost::statechart::simple_state<ProcessingTurn, PlayingGame, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list<boost::statechart::custom_reaction<ProcessTurn>, boost::statechart::deferral<SaveGameRequest>, boost::statechart::deferral<TurnOrders>, boost::statechart::deferral<Diplomacy>, boost::statechart::custom_reaction<CheckTurnEndConditions>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, boost::statechart::simple_state<ProcessingTurn, PlayingGame, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<ProcessingTurn, PlayingGame, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*) + 85
    frame #6: 0x001567d1 freeoriond`boost::statechart::simple_state<ProcessingTurn, PlayingGame, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*) + 43
    frame #7: 0x00163677 freeoriond`boost::statechart::state_machine<ServerFSM, Idle, std::__1::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&) + 99
    frame #8: 0x000a3b47 freeoriond`boost::statechart::state_machine<ServerFSM, Idle, std::__1::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events() + 119
    frame #9: 0x0009db16 freeoriond`boost::statechart::state_machine<ServerFSM, Idle, std::__1::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&) + 140
    frame #10: 0x0007ae35 freeoriond`ServerApp::HandleMessage(Message const&, std::__1::shared_ptr<PlayerConnection>) + 2141
    frame #11: 0x000a71d8 freeoriond`boost::_mfi::mf2<void, ServerApp, Message const&, std::__1::shared_ptr<PlayerConnection> >::operator()(ServerApp*, Message const&, std::__1::shared_ptr<PlayerConnection>) const + 76
    frame #12: 0x000a7161 freeoriond`void boost::_bi::list3<boost::_bi::value<ServerApp*>, boost::arg<1>, boost::arg<2> >::operator()<boost::_mfi::mf2<void, ServerApp, Message const&, std::__1::shared_ptr<PlayerConnection> >, boost::_bi::list2<Message const&, std::__1::shared_ptr<PlayerConnection> const&> >(boost::_bi::type<void>, boost::_mfi::mf2<void, ServerApp, Message const&, std::__1::shared_ptr<PlayerConnection> >&, boost::_bi::list2<Message const&, std::__1::shared_ptr<PlayerConnection> const&>&, int) + 85
    frame #13: 0x000a7107 freeoriond`boost::detail::function::void_function_obj_invoker2<boost::_bi::bind_t<void, boost::_mfi::mf2<void, ServerApp, Message const&, std::__1::shared_ptr<PlayerConnection> >, boost::_bi::list3<boost::_bi::value<ServerApp*>, boost::arg<1>, boost::arg<2> > >, void, Message, std::__1::shared_ptr<PlayerConnection> >::invoke(boost::detail::function::function_buffer&, Message, std::__1::shared_ptr<PlayerConnection>) + 51
    frame #14: 0x000c3c1c freeoriond`boost::function2<void, Message, std::__1::shared_ptr<PlayerConnection> >::operator()(Message, std::__1::shared_ptr<PlayerConnection>) const + 136
    frame #15: 0x000c3b50 freeoriond`void boost::_bi::list2<boost::_bi::value<Message>, boost::_bi::value<std::__1::shared_ptr<PlayerConnection> > >::operator()<boost::function<void (Message, std::__1::shared_ptr<PlayerConnection>)>, boost::_bi::list0>(boost::_bi::type<void>, boost::function<void (Message, std::__1::shared_ptr<PlayerConnection>)>&, boost::_bi::list0&, int) + 104
    frame #16: 0x000c3815 freeoriond`boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<boost::_bi::unspecified, boost::function<void (Message, std::__1::shared_ptr<PlayerConnection>)>, boost::_bi::list2<boost::_bi::value<Message>, boost::_bi::value<std::__1::shared_ptr<PlayerConnection> > > >, void>::invoke(boost::detail::function::function_buffer&) + 41
    frame #17: 0x000bbcc3 freeoriond`boost::function0<void>::operator()() const + 35
    frame #18: 0x000baf80 freeoriond`ServerNetworking::HandleNextEvent() + 128
    frame #19: 0x0007d35a freeoriond`ServerApp::Run() + 446
    frame #20: 0x00072aca freeoriond`main + 1484
    frame #21: 0x000724f5 freeoriond`start + 53

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: Late game (after turn 200) server disconnects

#13 Post by Geoff the Medio »

One thing that should probably change is this code:

Code: Select all

        int object_id = full_object->ID();
        if (!full_object) {
            ErrorLogger() << "UpdateEmpireLatestKnownObjectsAndVisibilityTurns found null object in m_objects with id " << object_id;
            continue;
        }
Testing if a pointer is null after dereferencing it doesn't help much...

Edit: done

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: Late game (after turn 200) server disconnects

#14 Post by Vezzra »

Geoff the Medio wrote:Are the save files in these cases of similar sizes? Are they saved in the same format?
Just reran my tests on all three platforms, on Mac and Windows with the contents of the save game folder were identical (just copied the save games from my Mac to my Windows system), and on the Linux systems there were a few additional save games in the save game folder. Format of all savegames should be compressed XML. Size of savegames ranging from 275KB to 2.5MB.

On Windows and Linux the load game dialog pops up without noticeable delay, on Mac it took 8 sec.

User avatar
Vezzra
Release Manager, Design
Posts: 6095
Joined: Wed Nov 16, 2011 12:56 pm
Location: Sol III

Re: Late game (after turn 200) server disconnects

#15 Post by Vezzra »

@Geoff: rebuild FO with your fix (b9f084e), didn't change anything. Same crash at the same location, same backtrace.

Post Reply