[PATCH] Improving TemporaryPtr

Programmers discuss here anything related to FreeOrion programming. Primarily for the developers to discuss.

Moderator: Committer

Message
Author
User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: [PATCH] Improving TemporaryPtr

#121 Post by Geoff the Medio »

cami wrote:There is absolutely no problem with making this number configurable.
Done: http://sourceforge.net/p/freeorion/code/6984/

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: [PATCH] Improving TemporaryPtr

#122 Post by Geoff the Medio »

Seemingly due to these changes, consistent crashes have been reported. Switching to just 1 thread reportedly didn't help...

viewtopic.php?p=68853#p68853

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: [PATCH] Improving TemporaryPtr

#123 Post by Dilvish »

cami wrote:Note however: setting it to 1 is not fully equivalent to the single-threaded mode; there will be only 1 worker thread, but that will still run independently of the control thread that created the Runqueue.
In order to help focus the examination here, how about making it so that if the number of worker threads is set to zero that it will just run the jobs in the main control thread rather than in a worker thread; it seems that would eliminate data races at least.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
adrian_broher
Programmer
Posts: 1156
Joined: Fri Mar 01, 2013 9:52 am
Location: Germany

Re: [PATCH] Improving TemporaryPtr

#124 Post by adrian_broher »

As mentioned above I'm also affected by the 'connection to server has been lost' bug mentioned in this thread for a week now.

I don't want to wait until cami responds so this can be fixed or debug this myself because I don't have enough insight to this code.

Reverting commit 6974.
Resident code gremlin
Attached patches are released under GPL 2.0 or later.
Git author: Marcel Metz

User avatar
cami
Space Dragon
Posts: 411
Joined: Tue Sep 20, 2011 10:32 pm
Location: Halle (Saale)

Re: [PATCH] Improving TemporaryPtr

#125 Post by cami »

Adrian, it would be nice if you assist in analyzing those crashes because I cannot reproduce them, and neither can Geoff or Dilvish. I dont know whether anyone else has investigated the source code; but I was unable to find something wrong by source analysis alone.
Yesterday, we were still on the brink. Fortunately, today we have come one step further.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

Re: [PATCH] Improving TemporaryPtr

#126 Post by Geoff the Medio »

cami wrote:Adrian, it would be nice if you assist in analyzing those crashes because I cannot reproduce them...
Is it possible to make smaller patches that implement parts of the -h patch which could be tested independently, or that can be applied sequentially to see if part of the changes can go in without causing crashes, and thus also to partly isolate the cause?

User avatar
adrian_broher
Programmer
Posts: 1156
Joined: Fri Mar 01, 2013 9:52 am
Location: Germany

Re: [PATCH] Improving TemporaryPtr

#127 Post by adrian_broher »

cami wrote:Adrian, it would be nice if you assist in analyzing those crashes because I cannot reproduce them, and neither can Geoff or Dilvish.
Sure. I already tried to investigate what could be the cause of this, but it seems like the patch introduces some stack corruption so the coredumps are pretty much worthless. I support Geoffs suggestion to break down the patch in more digestable parts, preferable in a way that the changes are minimal and compilable so it is easier to find the bad part of the patch.
Resident code gremlin
Attached patches are released under GPL 2.0 or later.
Git author: Marcel Metz

User avatar
cami
Space Dragon
Posts: 411
Joined: Tue Sep 20, 2011 10:32 pm
Location: Halle (Saale)

Re: [PATCH] Improving TemporaryPtr

#128 Post by cami »

The paradox is that the patch is intended to fix data races, so I cannot detect the problem by splitting it up. The nasty part is that my approach (and I cannot think of any better) of fixing the data races tends to make them rather fatal when they do still occur.

The reason your coredumps are worthless is that from the data race to the crash, a significant timespan can pass, as described in an earlier post. I'm trying to secure the reference counting of boost::shared_ptr, and a data race causes an error in the refcount value (usually off by one). The crash occurs after the refcount dropping to zero early, because being off by 1 (or more), resulting in an early free and use-after-free.

It seems that for many devs, (I know for sure it holds for me), the boost version used already secures boost::shared_ptr refcounting itself, so even if the additional locking fails no data races occur. That's what makes this bug so evasive.Detecting the problem requires detecting the data race, not catching the crash. Helgrind has proven very effective in that regard, I used it to eleminate all data races occuring despite boost-internal locking in my own boost version in an earlier patch.
Yesterday, we were still on the brink. Fortunately, today we have come one step further.

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: [PATCH] Improving TemporaryPtr

#129 Post by Dilvish »

cami wrote:The paradox is that the patch is intended to fix data races, so I cannot detect the problem by splitting it up.
Regardless of the problem the patch is intended to address, we might be able to detect what portion is causing this immediate disconnect from server (which appears caused by the patch for some people) if you split it up. It looks to me like it *could* be split up. The bulk of it seems related to the reworking of UniverseObjectVisitor related code, which would probably need to stay lumped together, but it also looks like there are some independent parts such as reworking some portions of ObjectMap, and perhaps also the changes to EnableTemporaryFromThis, which could be tested by themselves in a preliminary patch.
cami wrote:It seems that for many devs, (I know for sure it holds for me), the boost version used already secures boost::shared_ptr refcounting itself, so even if the additional locking fails no data races occur.
If were knew this were an issue hindering testing of the current problem then it seems we ought to be able to set up a testing environment using an older version of boost. Adrian, could you remind us what version of boost you are using?
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
cami
Space Dragon
Posts: 411
Joined: Tue Sep 20, 2011 10:32 pm
Location: Halle (Saale)

Re: [PATCH] Improving TemporaryPtr

#130 Post by cami »

The visitor code could be separated out, but should be applied first, then. Its purpose is mainly to strengthen type aliasing in order to evade wrong compiler optimizations.

Enabletemporary is intimitely linked to temporaryptr, consider them the same class conceptually, just lexically and physically separate.

The objectmap is in a similar situation as the visitor stuff, but ot quite the same. All mixed use of shared_ptr with temporaryptr is rigorously replaced by temporaryptr. This could in principle be done before adding more locks to temporaryptr, but might cause crashes because the current tempraryptr serialization is not thread safe (however as Geoff pointed out, might not need to be) and simply because temporaryptr becomes used in more critical situations (where the refcount is close to zero).
Adding the locks before changing objectmap is pointless, that would produce an even more fragile situation than the above, because the mutex for the new locks is inside the managed object, so data races in the object map almost certainly cause crashes.

I might add for funniness that this patch despite its size already IS a split off a different work with just the things to fight the mac crashes. xD
Yesterday, we were still on the brink. Fortunately, today we have come one step further.

User avatar
adrian_broher
Programmer
Posts: 1156
Joined: Fri Mar 01, 2013 9:52 am
Location: Germany

Re: [PATCH] Improving TemporaryPtr

#131 Post by adrian_broher »

Dilvish wrote:If were knew this were an issue hindering testing of the current problem then it seems we ought to be able to set up a testing environment using an older version of boost. Adrian, could you remind us what version of boost you are using?
Fedora 20 provides boost 1.54 on an x86_64 architecture. The user reported an issue on Arch and this provides boost 1.55. There is also a windows user report, but I don't know how reliable this is because Geoff should have had this problem too with the bundled boost 1.51(?) within the SDK.
Resident code gremlin
Attached patches are released under GPL 2.0 or later.
Git author: Marcel Metz

User avatar
cami
Space Dragon
Posts: 411
Joined: Tue Sep 20, 2011 10:32 pm
Location: Halle (Saale)

Re: [PATCH] Improving TemporaryPtr

#132 Post by cami »

Boost on Linux x86_64 v1.52, 1.54, 1.55 all use locking for shared_ptr. As I'm not managing the refcounts myself how could they possibly become wrong due to the patch?!
Yesterday, we were still on the brink. Fortunately, today we have come one step further.

User avatar
Dilvish
AI Lead and Programmer Emeritus
Posts: 4768
Joined: Sat Sep 22, 2012 6:25 pm

Re: [PATCH] Improving TemporaryPtr

#133 Post by Dilvish »

cami wrote:Boost on Linux x86_64 v1.52, 1.54, 1.55 all use locking for shared_ptr. As I'm not managing the refcounts myself how could they possibly become wrong due to the patch?!
I don't think we really know these crashes are necessarily being caused by bad refcounts, but it's my understanding from adrian that it's quite clearly something in the patch causing crashes. So hopefully breaking the patch into smaller pieces will help us isolate the cause of the crashes.
If I provided any code, scripts or other content here, it's released under GPL 2.0 and CC-BY-SA 3.0

User avatar
cami
Space Dragon
Posts: 411
Joined: Tue Sep 20, 2011 10:32 pm
Location: Halle (Saale)

Re: [PATCH] Improving TemporaryPtr

#134 Post by cami »

You are of course right, although it really looks very much like a use after free.
Youve probably already noticed Im short on time currently, but Ill set it on my to do list, visitors as a start.

As it seems the problem can be reproduced by some linux users on very similar setup as mine I found new hope that I will be able to reproduce it myself. Please add info about compiler version and flags used.
Yesterday, we were still on the brink. Fortunately, today we have come one step further.

Post Reply