Yes, in revision 4280 my first patch, that only limited search depth, was comitted, which brought a speedup factor of ~7. So you are only experiencing the speedup from my second patch (~3) and the very last changes (~1.5), amounting for a total speedup of ~4.5, which matches your numbers excellently.tzlaine wrote: Carsten, as you can see above, I have been unable to reproduce all your results. The optimized version runs in ~5s for me as for you, but I'm getting ~20s for the vanilla FO you first started investigating -- not minutes. I'm really curious what might be happening here, which is why I posted such detailed info on my test system above. Any idea why we're seeing such radically different things out of FO rev. 4282?
The inital post was at revision 4279, if you compare with that version you'll very likely see running times of about 2:00-2:30 minutes.
Edit: r4282 is correct, mixed this up. I'm unsure why the effect is less dramatic in optimized builds, but I guess inlining speeds up the tree/graph traversals dramatically.
Test conditions:
CPU: x86_64 Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz GenuineIntel
L1-Cache: Write Through, 256 kB, 4-way Set-associative
L2-Cache: Write Through, 1024 kB, 8-way Set-associative
L3-Cache: Write Back, 8192 kB, 16-way Set-associative
RAM: 2x DDR3, 64 bits width, 2048 MB (total 4GB), 1333 MHz
OS: GNU/Linux 2.6.38-gentoo-r6 #3 SMP
libc: glibc 2.12.2 shared
boost: 1.46.1 shared
compiler: gcc (Gentoo 4.4.5 p1.2, pie-0.4.5) 4.4.5
CFLAGS: -g -O2 -march=core2
all callgrind runs additionally with cflags:
-fno-inline -fno-inline-functions -fno-inline-small-functions -fno-indirect-inlining -fno-early-inlining -fno-inline-functions-called-once -finline-limit=0
Test operation:
1. Starting the client without parameters
2. loading the game file using menu
3. clicking on "turn".
4. terminate client when turn button became enabled again
Callgrind operation:
1. Starting valgrind --tool=callgrind freeoriond
2. client operation as above
3. server terminates automatically after exit of the last human client
Speedup estimations were deduced from callgrind total cycle comparison and rounded appropriately to the low measurement precision.
Times provided in the posts are simply gathered by "looking on the watch" and not very precise.
PS. On my system sizeof(int) = 4, sizeof(long) = 8. Note that boost::adjacency_list<> uses long e.g. for the number of vertices.
Looking forward to this, as I'm still not happy with the running time. There's still some container copying that cannot be removed, if you plan to replace some containers by std::vector we're probably going to see another slight speedup. However, such a change would have been too daring for me so I didn't include it ^^° I tried to write my code in a way though that it can benefit from it.tzlaine wrote:I still think one of my changes could replace some of Carsten's patch for an even faster result. More on that later.
Thank you very much =) And while we're at it: thanks to everyone for all the efforts in making this game what it is today, and what it will become. Very enjoyable! Well, I'm actually playing the game, I like to explore larger galaxies and just wanted to get the game to run more comfortably for me ^^° I didnt get as far as I hoped for, but I'm still quite happy with the result. Again, looking forward to your improvements, though I'm pessimistic about the less-than-a-second goal by now.tzlaine wrote:Again though, nice work! I have some comments on the patch, those will come after I've tried integrating some of my ideas into your existing work (those efforts may or may not moot my comments).
There's only one way I see to make another really substantial speedup, which is eleminating the big for-each-effect condition evaluation loop. This is hard work however, as the code wasn't designed that way. Maybe you can use a caching approach like a did for GetGraphIndex(), but it needs a to be a pretty smart one. Maybe a functional (as in functional programming) scripting engine can help you here, they can usually detect and eleminate re-evaluation of the same term internally.
P.S.2: going to run the tests as tzlaine did for comparison.