[PATCH] WithinStarlaneJumps implementation

Message

cami · #1 Post by **cami** » Wed Sep 21, 2011 9:42 pm

I was wondering why the game turned out so slow with just a few dozen colonized systems and only 500 in total (game attached). Just one step took several minutes to compute (Freeorion r4279). So I ran callgrind on freeoriond, it took more than three hours to complete one game step, but it was worth it.

I would like to throw in at this time that Freeorion is one of the best-written code examples I have ever read, but at the same time (unfortunately) it features some of the poorest algorithms I have ever seen.

The real evildoer is the WithinStarlaneJumps condition implementation.

Edit 2011-09-22 13:40: updated OP to my current knowledge of the code
WithinStarlaneJumps::Eval(context, matches, non_matches, search_domain)

WithinStarlaneJumps has two relevant members, m_subcondition and m_jumps, that represent what we expect to find in proximity and what maximum distance we allow. If search_domain is MATCHES, it is supposed to scan through matches and move all objects to non_matches that are not within m_jumps of an object matching subcondition. If search_domain is NON_MATCHES, instead it is supposed to scan through non_matches and move all objects to matches that are within m_jumps of an object matching subcondition.

For simplicity, I will assume that both all objects are star systems and seach_domain is MATCHES. Then, the current implementation results to this:

Code: Select all

subcondition_matches = eval(m_subcondition)

for candidate in matches:
    for it in subcondition_matches:
        distance = Universe::LeastJumpsPath(it, candidate)
        if distance <= m_jumps:
            match = true
            break
    if !match:
        move candidate from matches to non_matches

The complexity of this implementation obviously is #matches * #subcondition_matches * #starlanes_in_universe, the latter because Universe::LeastJumpsPath does a BFS to find the shortest path, scanning a significant portion of the universe when candidate and it are further apart. The callgrind file (gzipped 3.1MB) reveals 200 million node visits for just one game step in the test game attached (you can view the callgrind file after decompression with e.g. KCachegrind or WinCachegrind).

I claim that it is very easy to code a variation of Universe::LeastJumpsPath() so that:
- it respects a maximum distance and does not search beyond that (remember the first queue entry of a level to recognize the next level)
- it receives a set of origin systems instead of one system (just initialize the BFS queue with the whole set)
- it receives a set of targets instead of just one (just test for membership instead of identity compare the target)
- it returns the set of targets that respect the maximum distance (just move objects from targets set to result set when found)

Using this function instead of the two loops, we result in a complexity of #starlanes_in_range_of_subcondition_matches (for search) * log(#matches) (for membership test); for short ranges the membership test should dominate, for longer ranges we'd still have at most #starlanes_in_universe * log(#matches), which should be only a few thousand node visits in the test game.

Clearly, some adjustments are necessary to make this work for fleets. One approach I can think of is return a set of triples (origin, target, distance) and check the fleets in WithinStarlaneJumps::Eval() using these triples. Another approach might be allowing positions on starlanes as origins and targets as well, but I think that would result in more or less the same algorithm.

Personally, I would rather let planets emit (and withdraw) supply objects within range than checking distances, but the above suggestion should already bring huge speedups even for medium distances.

P.S. callgrind file (gzipped 3.1MB) is too large for upload, so I had to create a link to my server. Sorry!

#2 Post by **Geoff the Medio** » Wed Sep 21, 2011 10:26 pm

cami wrote:The real evildoer is the WithinStarlaneJumps condition implementation.

WithinDistance would probably also be a problem if it was used more / at all.

Beat me if I'm wrong, but this seems to be the implementation:
1. traverse the whole universe to find all objects matching the subcondition (i.e. what we expect within x jumps)

Yes.

2. store all matching (and non-matching) objects in a temporary set
3. for each match found, do a BFS to find the shortest path to it

If you mean store all the matches of the subcondition, and store all the candidates for the WithinStarlaneJumps condition, yes. The stored set of matches to the subcondition is then used to match all the candidates for WithinStarlaneJumps. (This applies unless the subcondition depends on the candidate for WithinStarlaneJumps that's being matched, in which case the subcondition needs to be re-evaluated for each candidate). Also, the BFS searching for each candidate to find a close enough subcondition-match is only done until a close enough sub-c-match is found, after which the candidate is accepted and the next is tested.

4. compare the length of that path to the limit and return "YES" if below

Right.

5. otherwise, return "NO"

Not quite; after finding all subcondition matches, the WithinStarlaneJumps condition tests the candidate objects against each subcondition match until it finds one that close enough, and doesn't return false / reject that candidate unless it's checked its distance from all subcondition matches and found none are close enough.

Since most subconditions will be very simple, I strongly suggest the following algorithm instead:

That's a dangerous assumption, if it means re-evaluating subconditions for each candidate object...

- run BFS once with a subcondition callback (functor) and starlane distance limit *
- until it either finds an object matching the subcondition,
- or until it has explored all systems within starlane distance limit

The distance limit to the BFS search is definitely a good idea, independent of other optimizations.

Having a callback / matching each subcondition object as the BFS progresses could work, as long as the results of previous subcondition tests are cached, and the subcondition only re-evaluated if that particular subcondition candidate hasn't yet been tested (or if the subcondition depends on the what the current WithinStarlaneJumps candidate object is). Otherwise, you could end up re-testing subcondition objects many times, as each WithinStarlaneJumps candidate test is search for a nearby match.

I'd suggest trying the distance limit in the search before the latter change. It might get rid of most noticable delay in of itself.

Personally, I would rather let planets emit (and withdraw) supply objects

Could you post about that separately with some explanation...? It's not clear what you mean, but it sounds like a separate issue from just optimizing the general-use WithinStarlaneJumps condition.

* P.P.S. if you don't want that, at least impose the hop limit and let BFS search for any of all candidates simultaneously

Indeed, I agree about having a hop limit, though I'm not sure what you mean by searching for all candidates simultaneously...

cami · #3 Post by **cami** » Thu Sep 22, 2011 9:14 am

Edit 2011-09-22 13:31: updated OP to my current knowledge of the code and refined my suggestion, in particular what I mean by simultaneous search.

OP now also contains a link to the thread about supply object emission.

cami · #4 Post by **cami** » Thu Sep 22, 2011 5:41 pm

I took a little time to implement the depth limit for BFS. As expected, it doesn't speed up too much (maybe a factor of 10, while we can reach a factor of 1000 by implementing all changes).

#5 Post by **Geoff the Medio** » Fri Sep 23, 2011 9:28 am

Do you release those (and any future) changes you posted under the GPL 2.0?

I haven't looked it over in detail yet, but assuming it gets committed, do you have a name you'd like used for the credits?

cami · #6 Post by **cami** » Fri Sep 23, 2011 11:10 am

Odd, there is no way to enter the real name into your forum profile.

Well, I'm Carsten Milkau. Nice to meet ya.

I'm working on a full implementation of the suggestion atm and it's almost done. Cross fingers that it'll work.

#7 Post by **Geoff the Medio** » Fri Sep 23, 2011 11:33 am

Geoff the Medio wrote:Do you release those (and any future) changes you posted under the GPL 2.0?

This part is rather important, but wasn't addressed in your response...

cami · #8 Post by **cami** » Fri Sep 23, 2011 11:49 am

I release all my patches to FreeOrion, now and in future, under the very same set of licenses FreeOrion is released under. Whenever this set changes I release the patches under that changed set.

I have a little problem integrating the new code for fleet locations. Let's assume we calculate the jump distance of a fleet to some arbitrary fixed object.

Currently, fleet jump distance is the maximum of the two distances to the starlane endpoints the fleet is flying along.
However, I only calculate the shortest path to any of the endpoints, i.e. I only have the minimum of those two distances.

Personally, I do not see why the maximum is taken, as that would mean if both starlane endpoints are equally far, the fleet is no further than the endpoints. But this isn't true, as it has some distance to both endpoints.

I would like to pick either the minimum of the two distances to the starlane endpoints (effectively ignoring the distance to the endpoints), or the minimum plus one (always respecting the distance to the endpoints).
The former will equal the current implementation when both endpoints are equally far away.
The latter will equal the current implementation when both endpoints are not equally far away (as there's a starlane, the difference must be one).

Edit: If the fleet is holding in a system, will fleet->SystemID() return a valid system id? If not, I could check if fleet->PreviousSystemID() and fleet->NextSystemID() are the same and treat it like a normal system object in that case.
If it absolutely has to be the maximum, I'd have to run the improved algorithm twice, which is ok but not really cool

#9 Post by **Geoff the Medio** » Fri Sep 23, 2011 12:41 pm

cami wrote:I would like to pick either the minimum of the two distances to the starlane endpoints (effectively ignoring the distance to the endpoints), or the minimum plus one (always respecting the distance to the endpoints).

The minimum plus one seems reasonable. In practice it probably doesn't matter much, as most tests of how many jumps it is to a system are probably intended to match things in systems, not fleets or ships moving between systems.

If the fleet is holding in a system, will fleet->SystemID() return a valid system id?

Yes; SystemID should only return INVALID_OBJECT_ID when an object is not in a system.

#10 Post by **tzlaine** » Fri Sep 23, 2011 1:16 pm

Nice work, Carsten! Geoff, I did take an in-depth look at the patch, and it looks solid to me. In the process, I was looking at the Boost.Graph docs and noticed that an A* search was added some time ago. We may want to use this instead of Dijkstra in ShortestPathImpl(). Carsten, are you game to investigate the optimization of that after you're done with WithinStarlaneJumps()? Also, further runs of callgrind after your current optimizations are checked in would be most welcome.

cami · #11 Post by **cami** » Fri Sep 23, 2011 6:49 pm

Thanks =D

Well, in fact I used up my holidays for implementing the full BFS improvement as in OP, and it even has a bug or two still :/

It shouldn't be too hard to replace the dijkstra call by A*, though don't expect too much: as the starlane graph is that sparse, the shortest path is very far from being straight, and as such a straight-line distance provides a poor distance estimation heuristic, and I can't think of any better simple heuristic. Dijkstra is essentially A* with the trivial heuristic that always returns 0.

P.S.
Funny incident: the last weeks, in fact I have wondered about a way to speed-up path search with a data structure storing auxiliary information about the graph for a different game (Shores of Hazeron). E.g., the Floyd Warshall algorithm returns a data structure that allows finding the shortest path in time proportional to the actual length of the path (which is optimal), but the data structure has size #nodes * #nodes. I hope to achieve a similar speedup with a data structure that is much smaller, e.g. log(#edges) * #edges.

#12 Post by **Geoff the Medio** » Fri Sep 23, 2011 7:05 pm

cami wrote:...as the starlane graph is that sparse, the shortest path is very far from being straight, and as such a straight-line distance provides a poor distance estimation heuristic...

On galaxies with "High" starlane frequency, the lane graph is locally quite dense. However you're right that a direct distance heuristic probably would have some bad cases, such as the marked path in the attached, where going further away (direct distance) is required to get around the corner, and there's a lot of stars in the centre of the galaxy that would be closer than the actual shortest path.

That said, this probably isn't much worse than breadth-first search, and could be better for the final stages when the direct-distance is a useful heuristic.

cami · #13 Post by **cami** » Fri Sep 23, 2011 10:26 pm

Yep, that's exactly what I meant.

I completed the patch now, but I can't see as much speedup as I expected. I need to run valgrind again to see why it is still slow, but since my time is up, that probably won't be within the next two weeks :/

The patch also contains a fix in CMakeLists.txt that sets LC_MESSAGES to "C" before fetching the revision number =)

Edit: Started a callgrind run overnight. Although there's no time to expand the patch at least I want to know what I overlooked.

cami · #14 Post by **cami** » Sat Sep 24, 2011 8:43 am

I changed the patch a little more

- building the distance map using Universe::SystemsWithinJumps() is now a separate function
- Condition::WithinStarlaneJumps::Match() now also uses distance pre-calculation

The new callgrind (gzipped 3.0M)is also done. It reveals two things:
- the new implementation is about 7 times faster
- the change from std::vector to boost::unordered map was costly, and now is the major bottleneck
- however, the set creation and destruction in StoreTargetsAndCausesOfEffectsGroups, especially in EffectTargetSetToConditionObjectSet, is even more expensive.
- Since we only iterate over these sets, using std::vector is probably a good idea here,
- and a generic Adapter that just passes through all operations can completely replace EffectTargetSetToConditionObjectSet() and its counterpart

#15 Post by **Geoff the Medio** » Sat Sep 24, 2011 5:29 pm

cami wrote:- the change from std::vector to boost::unordered map was costly, and now is the major bottleneck

It looks like you're only using unordered_map for the internals tracking of predecessors and colours in the graph. Since the vertices are indexed by integers without gaps from 0 to the number of verticies - 1, and are looked up by index number, why isn't vector the best container in this case...? If building the vector is slow, you could reserve an appropriate amount of space before filling it.

- however, the set creation and destruction in StoreTargetsAndCausesOfEffectsGroups, especially in EffectTargetSetToConditionObjectSet, is even more expensive.
- Since we only iterate over these sets, using std::vector is probably a good idea here

I agree. In the case of effect target sets and condition object sets, there's no need for sorting by pointer value, as is presently done in the SVN version.

- and a generic Adapter that just passes through all operations can completely replace EffectTargetSetToConditionObjectSet() and its counterpart

That sounds rather complicated, for not much benefit. If the std::set were replaced with std::vector for Condition::ObjectSet and Effect::TargetSet, that would seem likely to speed up the conversion quite a bit without the need for adapter code...

FreeOrion

[PATCH] WithinStarlaneJumps implementation

[PATCH] WithinStarlaneJumps implementation

Re: WithinStarlaneJumps implementation

Re: WithinStarlaneJumps implementation

[PATCH] Re: WithinStarlaneJumps implementation

Re: [PATCH] WithinStarlaneJumps implementation

Re: [PATCH] WithinStarlaneJumps implementation

Re: [PATCH] WithinStarlaneJumps implementation

Re: [PATCH] WithinStarlaneJumps implementation

Re: [PATCH] WithinStarlaneJumps implementation

Re: [PATCH] WithinStarlaneJumps implementation

Re: [PATCH] WithinStarlaneJumps implementation

Re: [PATCH] WithinStarlaneJumps implementation

Re: [PATCH] WithinStarlaneJumps implementation

Re: [PATCH] WithinStarlaneJumps implementation

Re: [PATCH] WithinStarlaneJumps implementation