-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Segfault when writing data out of order to pd.HDFStore via append #10180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So you need to show how you are reading in the data, and what you data has some bad rows in it, and thus is being read in as When storing in HDF5 you need to be espectially cognizant of dtypes. It will be non-performant if you have the wrong dtypes (and you may not even be able to store it).
|
@BastianL this also does not segfault for me (on macosx). |
I'm pretty sure the order in which the appends occur in this case matter. If you just write the pastebin as
@jreback going to try on mac os x when I get a chance |
@BastianL I opened the file in write mode only at the beginning. The appends happened in order. In any event, pls post your complete code. |
I seem to have the same problem on os x. A dialogue pops up warning that python3 quit unexpectedly.
|
Here is the example code:
You'll need to unzip these pickled files: |
@BastianL ok, so turns out this only core dumps on PyTables 3.2 (I have libhdft of 1.8.14). Not really sure why. So work-around is to use PyTables 3.1.1 which seems to work fine. You can also report to the PyTables Issue Tracker. This must be another edge case. Very odd. |
ok, can confirm that this works on
|
This should be fixed in PyTables/PyTables@5e2a63b. I will make a bug fix release soon-ish. |
ok looks like 3.2.1 just released |
thanks @jreback I forgot to mention it here |
@jreback I realized I never circled around to this, but PyTables/PyTables@5e2a63b fixes this. Thanks for helping get to the bottom of the issue. |
gr8! |
I am trying to append chunks of data to an (initially empty) HDF5 frame with pd.HDFStore. The chunks come in out of order, and sometimes certain orders produce segfaults. This script seems to consistently segfault after loading file 31 (update, see comments for better example). You will notice that by looking at the time stamps outputted by the script it appears to be when hdf5 tries to fill some gap data. I can produce more files that trigger segfaults if necessary.
I've managed to narrow it down to the following line in
pytables.py
A script to reproduce:
zipped pickled files for test: http://s000.tinyupload.com/?file_id=60238823358379433453
Here is a pastebin of the data where segfault is occuring from the example in csv format:
http://pastebin.com/FRsygCUG
note, you may need the actual files to reproduce this, but as you can see from the pastebin that the data isn't malformed
It is trying to fill the the following gap in the original data:
2011-01-04 17:55:00
to2011-01-05 22:15:00
with an append which results in a segfault
Script output:
The text was updated successfully, but these errors were encountered: