Skip to content

BUG: DataFrame.to_hdf segfault #10672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Jul 25, 2015 · 10 comments
Closed

BUG: DataFrame.to_hdf segfault #10672

TomAugspurger opened this issue Jul 25, 2015 · 10 comments
Labels
Bug Build Library building on various platforms IO HDF5 read_hdf, HDFStore
Milestone

Comments

@TomAugspurger
Copy link
Contributor

import pandas as pd

# import dask.dataframe as dd
N = 5000
df = pd.DataFrame({'A': [1, 2] * N, 'B': [3, 4] * N, 'C': ['a', 'b'] * N})

df.to_hdf('foo.h5', '/data', format='table', mode='a')

for _ in range(3):
    df.to_hdf('foo.h5', '/data', format='table', mode='a', append=True)

This actually works when N is smaller, perhaps something with a pytables chunksize

@TomAugspurger TomAugspurger added IO HDF5 read_hdf, HDFStore Bug labels Jul 25, 2015
@TomAugspurger TomAugspurger added this to the Next Major Release milestone Jul 25, 2015
@jreback
Copy link
Contributor

jreback commented Jul 25, 2015

does the file exist in the first place? eg use mode='w' in the first case

what version py tables?

@TomAugspurger
Copy link
Contributor Author

tables 3.2.0

the file is new originally, and then appending in the for loop.

@jreback
Copy link
Contributor

jreback commented Jul 25, 2015

hmm

couple of bugs in 3.2

ok we can out a test for this and see where it's happening

@TomAugspurger
Copy link
Contributor Author

N=2996 is the magic number, which gives a df.shape of (5992, 3). It works for N=2995 but segfaults for 2996.

@jreback
Copy link
Contributor

jreback commented Jul 25, 2015

looks like a chunk size issue in the PyTables side

@TomAugspurger
Copy link
Contributor Author

Just saw #10180 (comment)

Not sure if its the same, getting on a plane so I can't check now. I'll try that fix posted there.

@TomAugspurger
Copy link
Contributor Author

Confirmed to be fixed in pytables 3.2.1

@jreback
Copy link
Contributor

jreback commented Aug 4, 2015

gr8!

we might want to change install docs to recommend >= 3.0.0 and (but not 3.2) or maybe just >= 3.2.1

@jreback jreback reopened this Aug 4, 2015
@jreback jreback added the Build Library building on various platforms label Aug 4, 2015
@jreback jreback modified the milestones: 0.17.0, Next Major Release Aug 4, 2015
@jreback
Copy link
Contributor

jreback commented Aug 4, 2015

reoping to discuss whether to amend the install.rst to put a warning (to use >=3.0.0 & < 3.2, or > 3.2).
I normally wouldn't do this type of version usage, but this fixed a very hard to find bug.

thoughts?

@jreback
Copy link
Contributor

jreback commented Aug 26, 2015

closed via 3e83459

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Build Library building on various platforms IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

No branches or pull requests

2 participants