FreeOrion

Forums for the FreeOrion project
It is currently Sun May 19, 2013 12:32 pm

All times are UTC




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: [BUG] Weird sequence of messages/events in order submission
PostPosted: Fri Jul 06, 2012 2:59 pm 
Offline
Programmer and Packager
User avatar

Joined: Wed Nov 16, 2011 12:56 pm
Posts: 761
Location: Sol III
During my implementation of AI process priority management recently I came across something that looks like a weird sequence of messages/events during order submission processing. These strange behaviour occurs only if the human player hits "End turn" before the AI clients have submitted their orders. I already reported that briefly here, but now want to provide a more detailed bug report.

Important note: To test this scenario properly (AI clients which are busy doing their turns for a sufficiently long time) I modified generateOrders() in PythonAI.py to do a busy loop for 30 sec before it continues with actual order generation.

I'll try to illustrate what I found by walking through the respective log files. The first entries after hitting "End turn" are these in freeorion.log:
Code:
...
2012-07-06 15:29:01,002 DEBUG Client : MapWnd::EndTurn
2012-07-06 15:29:01,002 DEBUG Client : ClientNetworking::SendMessage() : sending message Message: TURN_ORDERS 1 --> -1(server/unknown)...
2012-07-06 15:29:01,002 DEBUG Client : (HumanClientFSM) PlayingTurn.TurnEnded
2012-07-06 15:29:01,003 DEBUG Client : (HumanClientFSM) ~PlayingTurn
2012-07-06 15:29:01,003 DEBUG Client : (HumanClientFSM) WaitingForTurnData
2012-07-06 15:29:01,003 DEBUG Client : (HumanClientFSM) WaitingForTurnDataIdle
...
So far everything is as it should be. The turn submission network message is sent to the server, the human client FSM does the transition from PlayingTurn to WaitingForTurnData state.

Now lets look at freeoriond.log to see how order submission processing proceeds server-side:
Code:
...
2012-07-06 15:29:01,003 DEBUG Server : ServerApp::HandleMessage type 16
2012-07-06 15:29:26,624 DEBUG Server : ServerApp::HandleMessage type 16
2012-07-06 15:29:26,625 DEBUG Server : ServerApp::HandleMessage type 19
2012-07-06 15:29:26,625 DEBUG Server : (ServerFSM) WaitingForSaveData.ClientSaveData
2012-07-06 15:29:26,640 DEBUG Server : ServerApp::HandleMessage type 16
2012-07-06 15:29:26,642 DEBUG Server : ServerApp::HandleMessage type 19
2012-07-06 15:29:26,642 DEBUG Server : (ServerFSM) WaitingForSaveData.ClientSaveData
2012-07-06 15:29:26,661 DEBUG Server : ServerApp::HandleMessage type 16
2012-07-06 15:29:26,664 DEBUG Server : ServerApp::HandleMessage type 19
2012-07-06 15:29:26,664 DEBUG Server : (ServerFSM) WaitingForSaveData.ClientSaveData
2012-07-06 15:29:26,664 DEBUG Server : UniverseObject::Copy Berkano has special DERELICT_SPECIAL3
2012-07-06 15:29:26,664 DEBUG Server :  ... which is copied.
...
"End turn" and subsequent sending of the order submission network message was triggered at 15:29:01,002. At 15:29:01,003 ServerApp::HandleMessage reports a "type 16" message, whatever that is. Then nothing else happens for ~25 sec! At 15:29:26,624 another "type 16" is handled, and subsequently a series of messages/events is logged that look like they belong to the save game process.

Following these messages a bunch of UniverseObject::Copy log entries like the last one quoted above appear, no need quote them in their entirety here. After these entries freeoriond.log continues with:
Code:
...
2012-07-06 15:29:26,666 DEBUG Server : GetEmpireKnownObjectsToSerialize
2012-07-06 15:29:26,666 DEBUG Server : UniverseObject::Copy  has special DERELICT_SPECIAL2
2012-07-06 15:29:26,666 DEBUG Server :  ... which is copied.
2012-07-06 15:29:26,666 DEBUG Server : UniverseObject::Copy  has special DERELICT_SPECIAL3
2012-07-06 15:29:26,666 DEBUG Server :  ... which is copied.
2012-07-06 15:29:26,666 DEBUG Server : UniverseObject::Copy  has special DERELICT_SPECIAL2
2012-07-06 15:29:26,666 DEBUG Server :  ... which is copied.
2012-07-06 15:29:26,667 DEBUG Server : UniverseObject::Copy  has special DERELICT_SPECIAL3
2012-07-06 15:29:26,667 DEBUG Server :  ... which is copied.
2012-07-06 15:29:26,667 DEBUG Server : UniverseObject::Copy  has special DERELICT_SPECIAL3
2012-07-06 15:29:26,667 DEBUG Server :  ... which is copied.
2012-07-06 15:29:26,673 DEBUG Server : (ServerFSM) ~WaitingForSaveData
...
At 15:29:26,673 the save game process seems to have completed. Order submission processing continues:
Code:
...
2012-07-06 15:29:26,673 DEBUG Server : (ServerFSM) WaitingForTurnEndIdle
2012-07-06 15:29:26,673 DEBUG Server : (ServerFSM) WaitingForTurnEnd.TurnOrders
2012-07-06 15:29:26,673 DEBUG Server : WaitingForTurnEnd.TurnOrders : Received orders from player 1
2012-07-06 15:29:26,673 DEBUG Server : (ServerFSM) WaitingForTurnEnd.TurnOrders
2012-07-06 15:29:26,673 DEBUG Server : WaitingForTurnEnd.TurnOrders : Received orders from player 2
2012-07-06 15:29:26,673 DEBUG Server : (ServerFSM) WaitingForTurnEnd.TurnOrders
2012-07-06 15:29:26,673 DEBUG Server : WaitingForTurnEnd.TurnOrders : Received orders from player 4
2012-07-06 15:29:26,674 DEBUG Server : (ServerFSM) WaitingForTurnEnd.TurnOrders
2012-07-06 15:29:26,674 DEBUG Server : WaitingForTurnEnd.TurnOrders : Received orders from player 3
2012-07-06 15:29:26,674 DEBUG Server : ServerApp::AllOrdersReceived()
2012-07-06 15:29:26,674 DEBUG Server : WaitingForTurnEnd.TurnOrders : All orders received.
2012-07-06 15:29:26,674 DEBUG Server : (ServerFSM) ~WaitingForTurnEndIdle
2012-07-06 15:29:26,674 DEBUG Server : (ServerFSM) ~WaitingForTurnEnd
2012-07-06 15:29:26,674 DEBUG Server : (ServerFSM) ProcessingTurn
2012-07-06 15:29:26,674 DEBUG Server : (ServerFSM) ProcessingTurnIdle
2012-07-06 15:29:26,674 ERROR Server : ServerFSM : A [ERROR: Unknown Event] event was passed to the ServerFSM.  This event is illegal in the FSM's current state.  It is being ignored.
2012-07-06 15:29:26,674 ERROR Server : ServerFSM : A [ERROR: Unknown Event] event was passed to the ServerFSM.  This event is illegal in the FSM's current state.  It is being ignored.
2012-07-06 15:29:26,674 ERROR Server : ServerFSM : A [ERROR: Unknown Event] event was passed to the ServerFSM.  This event is illegal in the FSM's current state.  It is being ignored.
...
At 2012-07-06 15:29:26,673 finally the server reports the reception of the turn orders from player 1 (which is the human player). That's ~25 sec after I'd hit "Turn End". The orders of the AI clients arrive effectively at the same time. As this 25 secs plus the few secs it took me to press "End turn" after turn 1 started sum up to the 30 sec busy loop i built into generateOrders(), this looks very much like my (the human player's) turn submission message somehow "hang" until all the AI clients were finished, and then all turn submission messages where handled at once.

The next thing that's conspicuous here is the fact that ServerApp::AllOrdersReceived() is only called once. It should be called every time a client submits his turn orders. Every time a client submits his turn orders, the WaitingForTurnEnd.TurnOrders event handler in the server FSM is triggered. At the end of the event handler routine a WaitingForTurnEnd.CheckTurnEndConditions event is posted for further processing:
Code:
    // check conditions for ending this turn
    post_event(CheckTurnEndConditions());
That means we have 4 of these events waiting in the event queue of the server FSM. In this event handler (WaitingForTurnEnd.CheckTurnEndConditions) ServerApp::AllOrdersReceived() is called (to check if all clients have submitted their orders and the server FSM should transit to the next state, ProcessingTurn). However, as the log entries clearly indicate, only one of the WaitingForTurnEnd.CheckTurnEndConditions is handled properly. The other 3 events apparently only get processed after the server FSM has made the transition into the ProcessingTurn state, so they end up handled by the unconsumed_event handler, which results in the 3 error messages at the end of the log file section quoted above. I've checked this assumption by running this test with 12 AI's, resulting in 12 of these error messages (as expected).

Now lets see what freeoriond.log tells us after that:
Code:
...
2012-07-06 15:29:26,674 DEBUG Server : ServerApp::ProcessTurns executing orders
2012-07-06 15:29:26,677 DEBUG Server : ProductionQueue::Update: Simulating future turns of production queue
2012-07-06 15:29:26,677 DEBUG Server : ProductionQueue::Update time: 0.133
2012-07-06 15:29:26,678 DEBUG Server : ========= Production Update for empire: 2 ========
...
Obviously turn processing kicks in (at 15:29:26,674) and continues from here. Lets look over to freeorion.log (client side):
Code:
...
2012-07-06 15:29:26,677 DEBUG Client : ClientNetworking::GetMessage() : received message Message: PLAYER_STATUS -1(server/unknown)...
2012-07-06 15:29:26,677 DEBUG Client : (HumanClientFSM) PlayingGame.PlayerStatus
2012-07-06 15:29:26,691 DEBUG Client : ClientNetworking::GetMessage() : received message Message: TURN_PROGRESS -1(server/unknown)...
...
I haven't omitted any log entries on the client side, so this is where freeorion.log continues. Nothing has happened here until our little chaos on the server side has cleared up, the next messages we see here are the ones to be expected to show up during server side turn processing. There's just one log entry of interest left, in freeorion.log:
Code:
...
2012-07-06 15:29:26,821 DEBUG Client : Autosaving to: /Users/.../Library/Application Support/FreeOrion/save/FreeOrion_Human_Player_Terran_0002.sav
...
Here the auto saving mechanism kicks in, as it should. Which means, it can't have interfered before. The odd thing is, if I turn off auto save, all the above doesn't happen. Turn orders arrive at the server and are handled immediately, no 25 sec wait, WaitingForTurnEnd.CheckTurnEndConditions is handled properly, no unconsumed_event errors - everything is as it should be.

Has anyone any idea what is going on here???

I've zipped the log files and attached them to this post.


Attachments:
Logs.zip [57.24 KiB]
Not downloaded yet
Top
 Profile  
 
 Post subject: Re: [BUG] Weird sequence of messages/events in order submiss
PostPosted: Sat Jul 07, 2012 6:27 am 
Offline
Programming, Design, and De Facto Lead
User avatar

Joined: Wed Oct 08, 2003 1:33 am
Posts: 7887
Location: Vancouver, BC
Vezzra wrote:
...ServerApp::HandleMessage reports a "type 16" message, whatever that is.
Type 16 should be a TURN_ORDERS message, sent from players to the server. Type 19 is CLIENT_SAVE_DATA, also sent from players to server.

I think what's happening is that when a turn is finished processing, the server sends the turn updates to all the players. When the AIs receive their turn updates, they start turn processing. When the human gets a turn update, it immediately initiates an autosave by sending a save game message to the server. This causes the server to go into the state where it is waiting for save data, and to request save state information from all players. While in this state, any turn orders messages (type 16) are deferred until after the next state transition. The AIs are busy processing their turns for another 25 seconds, and then after each finishes generating orders they send turn orders messages (type 16), and then responds to the save game data request and sends a save game data message (type 19). After all the save game data is received, the server does the save, and then returns to the idle state of waiting for turn orders from players. Then all the deferred turn orders messages received earlier are processed, during which a CheckTurnEndConditions event is posted for each. After the last turn orders message is processed, the first CheckTurnEndConditions event is processed, and the turn end conditions are met, so the server posts a ProcessTurn event and transitions to the ProcessingTurn state. Then the additional CheckTurnEndConditions events are processed, but the sever is in the ProcessingTurn so these are discarded. Then the ProcessTurn event is reached, and turn processing actually starts.


Top
 Profile  
 
 Post subject: Re: [BUG] Weird sequence of messages/events in order submiss
PostPosted: Sun Jul 08, 2012 10:23 am 
Offline
Programmer and Packager
User avatar

Joined: Wed Nov 16, 2011 12:56 pm
Posts: 761
Location: Sol III
That sounds like a very conclusive explanation, definitely makes sense.

However, assuming this analysis being correct, I think we have a serious problem here (besides the fact that because of the order things are processed the AI process priority management can't work as intended). Correct me if I got anything wrong:
Geoff the Medio wrote:
...While in this state, any turn orders messages (type 16) are deferred until after the next state transition...
So essentially, all the orders being turned in aren't registered and processed server-side until after the save has finished. Meaning, server-side the orders turned in are not part of the save game data.
Quote:
...The AIs are busy processing their turns for another 25 seconds, and then after each finishes generating orders they send turn orders messages (type 16), and then responds to the save game data request and sends a save game data message (type 19).
So when the AI clients start to process the save game data request they have already turned in their orders and of course updated their AI state data structure accordingly. Meaning client-side the save game data sent to the server includes the turn orders, so to speak.
Quote:
...After all the save game data is received, the server does the save, and then returns to the idle state of waiting for turn orders from players. Then all the deferred turn orders messages received earlier are processed...
If my analysis of this process so far is correct, we finally end up with a save game where the server-side save game data reflects the game state before the AI clients orders have been turned in, whereas the client-side save game data reflects the game state after the AIs have turned in their orders. That's some serious inconsistency, I see the potential for disaster here :?

I guess this hasn't caused noticable problems so far because of the current state of the AI code. But it most certainly will once we get some decent AI code done, and I also foresee a source for troubles und frustration for the AI programmers, because of the unpredictable results and behaviour their AI code most probably will show. So unless I'm missing something here, we need to fix that.

My first thought would be to try to address this issue in the AI client code. Maybe the client could wait a few seconds if a save game data request is sent, and process it before it starts to process the request for turn order submission? The human client should be save enough, as a human player won't be able to do anything before his client triggers the save game process and subsequently receives and processes the save game data request from the server.


Top
 Profile  
 
 Post subject: Re: [BUG] Weird sequence of messages/events in order submiss
PostPosted: Sun Jul 08, 2012 2:49 pm 
Offline
Programmer and Packager
User avatar

Joined: Wed Nov 16, 2011 12:56 pm
Posts: 761
Location: Sol III
Vezzra wrote:
...My first thought would be to try to address this issue in the AI client code. Maybe the client could wait a few seconds if a save game data request is sent, and process it before it starts to process the request for turn order submission?...
On a second thought, scrap that. After examining the relevant code sections a bit I think this would only eliminate one specific scenario where the save game process can go wrong. There needs to be a mechanism that ensures save game data is consistent between server and clients.

This would be my next suggestion to fix the problem: Introduce a "save game init phase" into the save game process. Instead of immediately sending out save game data requests upon receiving a save game request from the host player client, the server should first send out "init save game" messages to the clients. When a client processes this message, it transits into "saving game" state and responds to the server with a "init save game ack" message to indicate this transition. When in this state, a clients does nothing than waiting for and processing save game data requests.

So once the server has received the "init save game ack" messages from every client it can be sure that nothing is going on client-side that may interfere with and mess up the save game process. Then the server can safely send out the save game data requests and the save game process can continue as it does now. The clients can return to the state they were in before they received the "init save game" message after they have finished processing the save game data request.

I don't know if I'm up to the task to implementing that (I've to make myself sufficiently familiar with the client/server FSM and the networking code first, and I've to confess that looks quite daunting to me :shock:). Nevertheless I'd want to give it a try (unless someone else wants to do it), but before embarking on an adventure like that I want to put up my ideas for "public review" ;)

So what do you think? Do my ideas make any sense, do you have any corrections, or did I get it completely wrong? VWFF - Vezzra Waiting For Feedback :mrgreen:


Top
 Profile  
 
 Post subject: Re: [BUG] Weird sequence of messages/events in order submiss
PostPosted: Sun Jul 08, 2012 4:58 pm 
Offline
Programming, Design, and De Facto Lead
User avatar

Joined: Wed Oct 08, 2003 1:33 am
Posts: 7887
Location: Vancouver, BC
Vezzra wrote:
Introduce a "save game init phase" into the save game process. Instead of immediately sending out save game data requests upon receiving a save game request from the host player client, the server should first send out "init save game" messages to the clients. When a client processes this message, it transits into "saving game" state and responds to the server with a "init save game ack" message to indicate this transition. When in this state, a clients does nothing than waiting for and processing save game data requests.
How does this fix anything? What would a client do in a separate save game ready state that it couldn't do in response to a save game data request message?
Vezzra wrote:
There needs to be a mechanism that ensures save game data is consistent between server and clients.
What save game data specifically are you worried will be inconsistent? The only things I can think of that happen between turns are issuing orders and diplomacy state changes. AI responses to save game data requests currently happen after the AI has issued orders, so all that seems necessary to do is to include already-issued orders for the current turn in the saved gamestate, and to not ask AIs for more orders upon loading a game turn in which they have already submitted (until after the next turn is processed). My guess would be that this already happens, though I'm not sure.


Top
 Profile  
 
 Post subject: Re: [BUG] Weird sequence of messages/events in order submiss
PostPosted: Wed Jul 11, 2012 6:13 pm 
Offline
Programmer and Packager
User avatar

Joined: Wed Nov 16, 2011 12:56 pm
Posts: 761
Location: Sol III
Geoff the Medio wrote:
What save game data specifically are you worried will be inconsistent? The only things I can think of that happen between turns are issuing orders and diplomacy state changes.
That, and AI state data. The save game data the AI sends to the server upon receiving a save game data request is the "AIstate" dict (which of course changes during AI order generation).
Geoff the Medio wrote:
AI responses to save game data requests currently happen after the AI has issued orders, so all that seems necessary to do is to include already-issued orders for the current turn in the saved gamestate,...
Where? Server-side? How? The messages carrying the orders the AI clients have submitted are still in the queue, waiting for being processed by the server, which does not happen until after the save game process has been finished. So the server does not know anything about this orders when it saves game state data.

That is, unless the clients include any data that might have changed in the save game data message they send to the server, and let the server sort out how to incorporate this data into the game state data it's going to save. That's fairly easy for turn orders, as it's all-or-nothing. Either the server has already received turn data from a client, in which case it can safely ignore turn order data sent with save game data, or it has not received any turn orders from a client, in which case it just incorporates any turn order data included in the save game data sent by the client. However, in the latter case, on loading this saved game the server has to make sure that orders saved that way undergo normal order submission processing, exactly as they would have if they had been received and processed prior to the save game process.

Until recently that would have been sufficient (at least as far as I can tell), but with the introduction on diplomatic messages and diplomatic status updates things got more complicated. As I've seen so far (though I haven't taken a closer look, so I might be wrong here), the whole diplomatic stuff is handled independent of turn orders. That is, there can be several diplomatic messages and diplomatic status updates going back and forth while the players (human and AI) do their turns. When save game is triggered, these kind of messages might end up as well in queues waiting for further processing until after save game has finished.

But unlike orders it's not all-or-nothing, there might have been several message exchanges that already have taken place, and some that are still going on. You'll have to determine client-side: send the whole history of messages related to the diplomatic stuff? Just the last ones? How can the client know which messages have been received and processed by the server? Server-side: when the client sends the whole history of messages, how to sort out which to incorporate in save game data?

You see, maybe I'm blowing this totally out of proportions and seeing trouble where there is none. You know the code far better than I do, and consequently you can assess the potential problems far better. I'm just voicing my concerns here and want to make sure we don't overlook something that might cause us some serious amount of headache later if not addressed. I hope I have been able to explain what I mean somewhat clearly...?
Geoff the Medio wrote:
... and to not ask AIs for more orders upon loading a game turn in which they have already submitted (until after the next turn is processed)...
I've to admit I didn't think of that, but that's of course crucial. Although I assume that this is already taken care of.
Geoff the Medio wrote:
... My guess would be that this already happens, though I'm not sure.
Well, shouldn't we then make sure? I mean, these error messages on unconsumed events seem to indicate that the way autosave interfered with order submission wasn't something anticipated by whoever coded the save game procedures. I'm wondering if that's enough uncertainty to at least take a look at the whole thing.
Geoff the Medio wrote:
Vezzra wrote:
Introduce a "save game init phase" into the save game process...
How does this fix anything? What would a client do in a separate save game ready state that it couldn't do in response to a save game data request message?
You are right, just inserting a additional "init" step doesn't solve anything, this approach has to be "fleshed out". The main problem here are the messages left pending in queues during the save game process, because they contain essential information. If process A sends a message to process B and you save important "state" data of both processes while the message is in transit, you loose any information that might be relevant for the "state" data contained in this message. So you basically have two options: save all messages in transit too or make sure all of them are processed first, and only proceed with the save when all message queues are empty.

What I've suggested would be the latter option: Upon receiving the "save game init" message the clients would transit into "save game ready state" (and send the "save game init ack" to the server). In this state they continue processing incoming messages, but do not send out new ones. The server also continues processing incoming messages, but does not send out new ones, once it has received the "ack" from all clients, it sends out the save game data request messages. When a client receives this message, it can be sure that all other clients too have entered "save game ready state", and therefore can, as soon as it has processed all incoming messages left in it's queue, be sure that there will be no new ones until the save game process has finished. At that point the client can send it's save game data to the server.

The server likewise can safely process all incoming messages in it's queue, and finally save the gamestate without any messages (and the information they carry) stuck in the message queue.

Instead you could of course just save the message queues, but either way you need these two steps, because you need to reach a state where all participating processes can safely assume that while they emtpy their message queues (or save them) no new messages will arrive. The "save game init" step acts as some kind of a "freeze", if you want to put it that way. I don't see how that can be achieved in one step (by just sending out the save game data request message).

I hope I was able to explain properly how my approach fixes the problem I see?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group