Page 1 of 2

Late game (after turn 200) server disconnects

Posted: Thu Mar 16, 2017 2:21 am
by mem359
I've been noticing that most of my games have crashed with a server disconnect (exit to main menu). It has been happening late, after Experimenters have shown themselves. I was having problems replicating it, but I can finally repeat it.

For the particular save game I am using, it is turn 208.
After loading the game, it takes a while before the screen refreshes, the galaxy is visible, and the turn button shows up.

If I click the "turn" button in the next 6 seconds or less, the game stops with a server disconnect.
If I wait 10 seconds or more, the turn processes normally and the game doesn't crash. (!!!)
(Is there some process running in the background that is still active even after the new turn starts?)

I compared the log files for the "normal" turn and the "bad" server disconnect.

"bad" freeorion.log (what seems to be the appropriate part)

Code: Select all

2017-03-15 21:51:34.664377 [debug] Client : ScopedTimer.cpp:28 : MapWnd::MidTurnUpdate time:       80 ms
2017-03-15 21:51:34.804525 [debug] Client : HumanClientFSM.cpp:483 : (HumanClientFSM) PlayingGame.TurnProgress
2017-03-15 21:51:34.848995 [debug] Client : HumanClientFSM.cpp:483 : (HumanClientFSM) PlayingGame.TurnProgress
2017-03-15 21:51:36.499556 [debug] Client : ClientNetworking.cpp:315 : ClientNetworking::NetworkingThread() : Networking thread will be terminated due to disconnect exception "End of file"
2017-03-15 21:51:36.505796 [debug] Client : HumanClientApp.cpp:1212 : HumanClientApp::DisconnectedFromServer
"bad" freeoriond.log (these are the last lines of the file)

Code: Select all

2017-03-15 21:51:35.060228 [debug] Server : LoggingWrapper.cpp:23 : Executing turn events for turn 208
2017-03-15 21:51:35.274204 [debug] Server : Universe.cpp:1380 : Issue times: planet species: 0.612 ship species: 0.262 specials: 2.634 techs: 14.417 buildings: 2.621 hulls/parts: 7.715 fields: 0.773
2017-03-15 21:51:35.274310 [debug] Server : Universe.cpp:1387 : Evaluation time: 356.029 reorder time: 1.95
2017-03-15 21:51:35.299018 [debug] Server : ScopedTimer.cpp:28 : Universe::ExecuteEffects time:       18 ms
"normal" freeorion.log (at a similar progress state)

Code: Select all

2017-03-15 21:48:30.896341 [debug] Client : ScopedTimer.cpp:28 : MapWnd::MidTurnUpdate time:       65 ms
2017-03-15 21:48:31.027754 [debug] Client : HumanClientFSM.cpp:483 : (HumanClientFSM) PlayingGame.TurnProgress
2017-03-15 21:48:31.075563 [debug] Client : HumanClientFSM.cpp:483 : (HumanClientFSM) PlayingGame.TurnProgress
2017-03-15 21:48:31.613306 [debug] Client : HumanClientFSM.cpp:483 : (HumanClientFSM) PlayingGame.TurnProgress
2017-03-15 21:48:31.910730 [debug] Client : HumanClientFSM.cpp:599 : (HumanClientFSM) PlayingGame.TurnUpdate
2017-03-15 21:48:31.969479 [debug] Client : SerializeUniverse.cpp:70 : Universe::serialize : deserializing universe width: 1063.81
"normal" freeoriond.log (at a similar progress state)

Code: Select all

2017-03-15 21:48:31.263431 [debug] Server : LoggingWrapper.cpp:23 : Executing turn events for turn 208
2017-03-15 21:48:31.479152 [debug] Server : Universe.cpp:1380 : Issue times: planet species: 0.49 ship species: 0.214 specials: 3.257 techs: 13.436 buildings: 2.08 hulls/parts: 8.121 fields: 0.711
2017-03-15 21:48:31.479255 [debug] Server : Universe.cpp:1387 : Evaluation time: 362.136 reorder time: 2.252
2017-03-15 21:48:31.504574 [debug] Server : ScopedTimer.cpp:28 : Universe::ExecuteEffects time:       19 ms
2017-03-15 21:48:31.552417 [debug] Server : ServerApp.cpp:3060 : ServerApp::PostCombatProcessTurns Turn number incremented to 209
I'm hoping someone more knowledgeable has ideas of what would cause the server to stop like that (not getting to the PostCombatProcessTurns) or cause an "End of File" exception. Or why it would depend on how quickly the turn button is pressed.

Re: Late game (after turn 200) server disconnects

Posted: Thu Mar 16, 2017 3:09 am
by Dilvish
could you post your savegame file someplace, so that we could test it out?

Re: Late game (after turn 200) server disconnects

Posted: Thu Mar 16, 2017 11:42 am
by mem359
Checking if I can attach a 2.3 MB gzip-tar save file
(unzips to 2.7 MB)

I'm using MacOSX. If I need the save file to use the FO binary format instead of the standard option, please let me know.

Re: Late game (after turn 200) server disconnects

Posted: Thu Mar 23, 2017 1:42 pm
by mem359
Another save file, this one using the most recent official test build (2017-03-20.60b8488).
It has the virtue (sin?) of being consistent, causing an EOF termination no matter how long I wait before pressing the next turn button.

freeorion.log (just the part when the game abruptly ends)

Code: Select all

2017-03-23 09:13:18.810890 [debug] Client : ScopedTimer.cpp:28 : MapWnd::InitTurnRendering time:       84 ms
2017-03-23 09:13:18.811199 [debug] Client : ScopedTimer.cpp:28 : MapWnd::MidTurnUpdate time:       85 ms
2017-03-23 09:13:19.788711 [debug] Client : ClientNetworking.cpp:354 : Client connection disconnected by EOF from server.
2017-03-23 09:13:19.796627 [debug] Client : HumanClientApp.cpp:1167 : HumanClientApp::DisconnectedFromServer
freeoriond.log (last lines of the file)

Code: Select all

2017-03-23 09:13:19.345703 [debug] Server : LoggingWrapper.cpp:23 : Executing turn events for turn 210
2017-03-23 09:13:19.550479 [debug] Server : Universe.cpp:1380 : Issue times: planet species: 0.2 ship species: 0.273 specials: 3.168 techs: 13.468 buildings: 2.005 hulls/parts: 8.138 fields: 1.013
2017-03-23 09:13:19.550584 [debug] Server : Universe.cpp:1387 : Evaluation time: 480.403 reorder time: 2.093
2017-03-23 09:13:19.582132 [debug] Server : ScopedTimer.cpp:28 : Universe::ExecuteEffects time:       25 ms
An earlier turn save (where the EOF does not happen) shows that the freeorion.log normally has more lines, like:
  • HumanClientFSM.cpp:670 : (HumanClientFSM) PlayingGame.TurnUpdate
    Client : SerializeUniverse.cpp:70 : Universe::serialize : deserializing universe width: 1100.55
    Client : SerializeUniverse.cpp:140 : Universe::serialize done
    Client : ScopedTimer.cpp:28 : Turn Update Unpacking time: 181 ms
    Client : HumanClientFSM.cpp:683 : Extracted TurnUpdate message for turn: 207
...while the freeoriond.log normally proceeds to something like:
Server : ServerApp.cpp:3056 : ServerApp::PostCombatProcessTurns Turn number incremented to 207

Re: Late game (after turn 200) server disconnects

Posted: Thu Mar 23, 2017 2:50 pm
by LGM-Doyle
mem359, I was unable to duplicate the crash with either the old save or the new save.

Could you post the complete logs (freeorion.log, freeoriond.log and the 6 AI_x.log) from a game where you load the reliable crash savegame and let it crash?

Re: Late game (after turn 200) server disconnects

Posted: Thu Mar 23, 2017 3:02 pm
by mem359
LGM-Doyle wrote:mem359, I was unable to duplicate the crash with either the old save or the new save.

Could you post the complete logs (freeorion.log, freeoriond.log and the 6 AI_x.log) from a game where you load the reliable crash savegame and let it crash?
Thanks for testing it.

I was worried about that. The problem might be with the MacOSX build only (subset of users) or some interaction with other software on my computer (me only).

Unless a better suggestion comes up, guess the next step is to try a game that doesn't have Experimenters, and see if the turn-210ish problem is related to them (or happens anyways).

Re: Late game (after turn 200) server disconnects

Posted: Thu Mar 23, 2017 3:14 pm
by Vezzra
I've been able to reproduce this on my Mac with the latest provided savegame. Log files attached.

Re: Late game (after turn 200) server disconnects

Posted: Thu Mar 23, 2017 3:46 pm
by Vezzra
I cannot reproduce this on Windows and Linux. Which means that this is a Mac specific issue.

Another observation I made: When clicking "Load Game", the load game dialog box does not pop up immediately, there is a noticeable lag, which apparently gets longer to more savegames it has to read in, and the bigger those savegames are. I don't get this lag on Windows and Linux. Maybe related...?

Re: Late game (after turn 200) server disconnects

Posted: Thu Mar 23, 2017 4:02 pm
by Geoff the Medio
Vezzra wrote:Another observation I made: When clicking "Load Game", the load game dialog box does not pop up immediately, there is a noticeable lag, which apparently gets longer to more savegames it has to read in, and the bigger those savegames are. I don't get this lag on Windows and Linux. Maybe related...?
Are the save files in these cases of similar sizes? Are they saved in the same format?

Re: Late game (after turn 200) server disconnects

Posted: Thu Mar 23, 2017 5:37 pm
by LGM-Doyle
Vezzra thanks for the log files.

The problem appears to be in the server but I can not tell what the problem is.

Could you run whatever debugger that you use and post a backtrace of the server process crashing?

Re: Late game (after turn 200) server disconnects

Posted: Fri Mar 24, 2017 12:29 pm
by mem359
I suspect that this has something to do with the Experimentors.
I've tried 3 games, with the same random number seed and Galaxy setup.
What changed were the "scripting" files that affected the Experimenters.

1) Modified monster_fleets.inf so that spawnrate=10.0 and Not WithinStarlaneJumps jumps = 4
For the game that had Experimentors in the galaxy, the EOF server failure happened on turn 210, a short while after the Experimenters were spawning. (This was the 2nd or 3rd time that the Sitrep warned about Black Krackens being seen.)

2) Modified monster_fleets.inf so that spawnrate=10000.0 and Not WithinStarlaneJumps jumps = 16
For the galaxy that did not have Experimentors, I have reached turn 270 without the EOF problem (and still going).

3) Brought back the Experimentors, and changed the EXPERIMENTOR_OUTPOST.focs.txt file, so that the EXPERIMENTOR_SPAWN_BASE_TURN is now 100 instead of 200
Got the EOF problem on turn 107. (I'm far away from the Experimentor outpost, but they are probably active.)

EDIT:
4) Changed the Exp base turn to 20.
The game actually "quit unexpectedly" on turn 22, with a EXC_BAD_ACCESS (SIGSEGV).
Restarted from the turn 20 save game, and just hit next turn (instead of "playing"). Got the server disconnect on turn 40.

Re: Late game (after turn 200) server disconnects

Posted: Fri Mar 24, 2017 12:42 pm
by Vezzra
LGM-Doyle wrote:Could you run whatever debugger that you use and post a backtrace of the server process crashing?
Here we go. In the debugger I got these lines the moment the server crashed:

Code: Select all

Process 1298 stopped
* thread #1: tid = 0xdd4e, 0x00be64ca freeorioncommon.dylib`System::Copy(std::__1::shared_ptr<UniverseObject const>, int) + 1828, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x622d3233)
    frame #0: 0x00be64ca freeorioncommon.dylib`System::Copy(std::__1::shared_ptr<UniverseObject const>, int) + 1828
freeorioncommon.dylib`System::Copy:
->  0xbe64ca <+1828>: cmpl   %eax, (%ebx)
    0xbe64cc <+1830>: movl   %ebx, %eax
    0xbe64ce <+1832>: jne    0xbe64c7                  ; <+1825>
    0xbe64d0 <+1834>: cmpl   %esi, %ebx
Backtrace:

Code: Select all

* thread #1: tid = 0xdd4e, 0x00be64ca freeorioncommon.dylib`System::Copy(std::__1::shared_ptr<UniverseObject const>, int) + 1828, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x622d3233)
  * frame #0: 0x00be64ca freeorioncommon.dylib`System::Copy(std::__1::shared_ptr<UniverseObject const>, int) + 1828
    frame #1: 0x00bfcc95 freeorioncommon.dylib`Universe::UpdateEmpireLatestKnownObjectsAndVisibilityTurns() + 533
    frame #2: 0x0009bfe0 freeoriond`ServerApp::PostCombatProcessTurns() + 8780
    frame #3: 0x00153179 freeoriond`ProcessingTurn::react(ProcessTurn const&) + 473
    frame #4: 0x00157928 freeoriond`boost::statechart::detail::reaction_result boost::statechart::custom_reaction<ProcessTurn>::react<ProcessingTurn, boost::statechart::event_base, void const*>(ProcessingTurn&, boost::statechart::event_base const&, void const* const&) + 90
    frame #5: 0x00157891 freeoriond`boost::statechart::detail::reaction_result boost::statechart::simple_state<ProcessingTurn, PlayingGame, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::local_react_impl_non_empty::local_react_impl<boost::mpl::list<boost::statechart::custom_reaction<ProcessTurn>, boost::statechart::deferral<SaveGameRequest>, boost::statechart::deferral<TurnOrders>, boost::statechart::deferral<Diplomacy>, boost::statechart::custom_reaction<CheckTurnEndConditions>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, boost::statechart::simple_state<ProcessingTurn, PlayingGame, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0> >(boost::statechart::simple_state<ProcessingTurn, PlayingGame, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>&, boost::statechart::event_base const&, void const*) + 85
    frame #6: 0x001567d1 freeoriond`boost::statechart::simple_state<ProcessingTurn, PlayingGame, boost::mpl::list<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, (boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base const&, void const*) + 43
    frame #7: 0x00163677 freeoriond`boost::statechart::state_machine<ServerFSM, Idle, std::__1::allocator<void>, boost::statechart::null_exception_translator>::send_event(boost::statechart::event_base const&) + 99
    frame #8: 0x000a3b47 freeoriond`boost::statechart::state_machine<ServerFSM, Idle, std::__1::allocator<void>, boost::statechart::null_exception_translator>::process_queued_events() + 119
    frame #9: 0x0009db16 freeoriond`boost::statechart::state_machine<ServerFSM, Idle, std::__1::allocator<void>, boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base const&) + 140
    frame #10: 0x0007ae35 freeoriond`ServerApp::HandleMessage(Message const&, std::__1::shared_ptr<PlayerConnection>) + 2141
    frame #11: 0x000a71d8 freeoriond`boost::_mfi::mf2<void, ServerApp, Message const&, std::__1::shared_ptr<PlayerConnection> >::operator()(ServerApp*, Message const&, std::__1::shared_ptr<PlayerConnection>) const + 76
    frame #12: 0x000a7161 freeoriond`void boost::_bi::list3<boost::_bi::value<ServerApp*>, boost::arg<1>, boost::arg<2> >::operator()<boost::_mfi::mf2<void, ServerApp, Message const&, std::__1::shared_ptr<PlayerConnection> >, boost::_bi::list2<Message const&, std::__1::shared_ptr<PlayerConnection> const&> >(boost::_bi::type<void>, boost::_mfi::mf2<void, ServerApp, Message const&, std::__1::shared_ptr<PlayerConnection> >&, boost::_bi::list2<Message const&, std::__1::shared_ptr<PlayerConnection> const&>&, int) + 85
    frame #13: 0x000a7107 freeoriond`boost::detail::function::void_function_obj_invoker2<boost::_bi::bind_t<void, boost::_mfi::mf2<void, ServerApp, Message const&, std::__1::shared_ptr<PlayerConnection> >, boost::_bi::list3<boost::_bi::value<ServerApp*>, boost::arg<1>, boost::arg<2> > >, void, Message, std::__1::shared_ptr<PlayerConnection> >::invoke(boost::detail::function::function_buffer&, Message, std::__1::shared_ptr<PlayerConnection>) + 51
    frame #14: 0x000c3c1c freeoriond`boost::function2<void, Message, std::__1::shared_ptr<PlayerConnection> >::operator()(Message, std::__1::shared_ptr<PlayerConnection>) const + 136
    frame #15: 0x000c3b50 freeoriond`void boost::_bi::list2<boost::_bi::value<Message>, boost::_bi::value<std::__1::shared_ptr<PlayerConnection> > >::operator()<boost::function<void (Message, std::__1::shared_ptr<PlayerConnection>)>, boost::_bi::list0>(boost::_bi::type<void>, boost::function<void (Message, std::__1::shared_ptr<PlayerConnection>)>&, boost::_bi::list0&, int) + 104
    frame #16: 0x000c3815 freeoriond`boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<boost::_bi::unspecified, boost::function<void (Message, std::__1::shared_ptr<PlayerConnection>)>, boost::_bi::list2<boost::_bi::value<Message>, boost::_bi::value<std::__1::shared_ptr<PlayerConnection> > > >, void>::invoke(boost::detail::function::function_buffer&) + 41
    frame #17: 0x000bbcc3 freeoriond`boost::function0<void>::operator()() const + 35
    frame #18: 0x000baf80 freeoriond`ServerNetworking::HandleNextEvent() + 128
    frame #19: 0x0007d35a freeoriond`ServerApp::Run() + 446
    frame #20: 0x00072aca freeoriond`main + 1484
    frame #21: 0x000724f5 freeoriond`start + 53

Re: Late game (after turn 200) server disconnects

Posted: Fri Mar 24, 2017 1:00 pm
by Geoff the Medio
One thing that should probably change is this code:

Code: Select all

        int object_id = full_object->ID();
        if (!full_object) {
            ErrorLogger() << "UpdateEmpireLatestKnownObjectsAndVisibilityTurns found null object in m_objects with id " << object_id;
            continue;
        }
Testing if a pointer is null after dereferencing it doesn't help much...

Edit: done

Re: Late game (after turn 200) server disconnects

Posted: Fri Mar 24, 2017 1:06 pm
by Vezzra
Geoff the Medio wrote:Are the save files in these cases of similar sizes? Are they saved in the same format?
Just reran my tests on all three platforms, on Mac and Windows with the contents of the save game folder were identical (just copied the save games from my Mac to my Windows system), and on the Linux systems there were a few additional save games in the save game folder. Format of all savegames should be compressed XML. Size of savegames ranging from 275KB to 2.5MB.

On Windows and Linux the load game dialog pops up without noticeable delay, on Mac it took 8 sec.

Re: Late game (after turn 200) server disconnects

Posted: Fri Mar 24, 2017 1:15 pm
by Vezzra
@Geoff: rebuild FO with your fix (b9f084e), didn't change anything. Same crash at the same location, same backtrace.