-
Notifications
You must be signed in to change notification settings - Fork 236
Uninitialized AttributeFactory instance #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Good catch. We've seen this as well with NumericField, but only ever in Release builds. |
I've made this change to Lucene now, and recompiled:
|
Added call to abort() in _DEBUG mode as well:
|
Could the reason be that during the AttributeSource constructor it creates a DEFAULT_ATTRIBUTE_FACTORY using a local static variable without any mutex synchronization? If multiple threads went to initialize the factory at the same time you might end up with some weird data. |
Yes, you seem to be correct about that. Good catch! |
I've added "run lucene test programs under tsan" on my TODO list. Maybe there's more bugs like this. tsan should have spotted that bug immediately. |
Got tsan hit on this problem, but nothing more as far as I can see:
|
Patch to provoke the tsan hit:
|
I've patched up the initial tsan hit with a mutex now. But after that I get a billion other tsan hits instead. But I don't understand them. Here's the output: Thread T3:
Thread T2:
The reason I don't understand this hit is that the code in question look like this:
Isn't boost::call_once a blocking operation? I.e. shouldn't all subsequent threads wait until the first thread is finished calling ZZ_CMAP_INIT? |
I agree, it should be thread safe, boost docs:
|
I didn't see any .cpp boost files in the backtrace though, but I guess the reason is that my boost is not compiled with tsan. I'm compiling up boost with tsan now. |
Yes, compiling libbost with tsan fixed it. Afterwards there was only one tsan hit. In this function:
After temporarily patching that function up with a mutex, I don't get any other tsan hits. |
After tweaking the indexer demo program a little bit, I managed to provoke another tsan hit: Thread T2:
Thread T6:
Additional information:
This one I don't understand, the code is not immediately claer, and it doesn't always happen eigher. But adding a mutex at least seems to stop the tsan hits. |
(I added a mutex to the top of StandardFilter::incrementToken.) |
Sorry, I should have looked at the code 1 minute more. The race condition happens in these two functions, which are not thread safe:
|
This looks like another race condition:
|
Nice work - thanks for looking into this. |
I've gone through all static variables now, and fixed them for multithreaded access: |
Fixed in #183 |
Fix race conditions when initializing static variables. (Fix for issue #181)
Can you please release 3.0.9 with this fix? |
I've released 3.0.9 now Dmitry |
Thank you! :) |
On one run (out of 6 or so), I did hit: ``` [ RUN ] IndexWriterTest.testAddIndexesWithRollback lucene++-tester: /usr/include/boost/smart_ptr/shared_ptr.hpp:550: typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = Lucene::DocumentsWriter; typename boost::detail::sp_member_access<T>::type = Lucene::DocumentsWriter*]: Assertion `px != 0' failed. /var/tmp/portage/dev-cpp/lucene++-3.0.9/temp/environment: line 1082: 28 Aborted (core dumped) "$@" ``` Not sure if it's another instance of something like luceneplusplus/LucenePlusPlus#181 or what, but not debugged further as the testsuite takes a while and had already spent too much time on lucene++ today. The test restriction had been there since the package was added. Signed-off-by: Sam James <[email protected]>
Hi, thanks for the work on lucene++!
It's working great, except for a problem I have with something that looks like an uninitialized AttributeFactory variable in an AttributeSource instance.
Unfortunately it's not reproducible. Maybe it happens one of 100 times, I'm not sure.
It seems to crash at the same place every time.
It happens when I'm calling 'Lucene::newLuceneLucene::NumericField()'. And I call this function from many threads at the same time.
Here's the final place it's crashing:

Here's the place right before the final crash:

(factory->pn.ptr==NULL.
factory->px looks like areal pointer, but the memory it's pointing to looks uninitialized (all bytes filled up with 0xdd).)
And here's a screenshot for the backtrace (I'm not very familiar with visual studio so I don't know how to get a proper text block)

As a workaround I'm going to try adding a check so that lucene won't call factory->createInstance() if factory is uninitialized (plus an assertion of course). But is there anything else I can do to track down what goes wrong?
Thanks for your work.
The text was updated successfully, but these errors were encountered: