Skip to content

inconisitent behaviour of hdf5 store.append for MultiIndex vs simple Index #11364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DSLituiev opened this issue Oct 18, 2015 · 4 comments
Closed
Labels
Bug Duplicate Report Duplicate issue or pull request IO HDF5 read_hdf, HDFStore

Comments

@DSLituiev
Copy link

The following example works with MultiIndex, but fails with simple index, which is bad in terms of generalization / intuitive behaviour:

#Create example:
col_nums = [0]
df = pd.DataFrame({"V1":["a","b","c","d","e", "aaaah!!!"], 
                          "W":["c","d","c","d","c","c"],
                          "data":np.arange(6)})
df.set_index(["V1","W"], inplace = True)
df.to_csv("testtable.tab",sep = "\t")
# MulitIndex -> works:
sep = "\t"
indexcols =[0,1]
chunksize=5

xbed = "testtable.tab"
%rm 'tempstore.h5'
# create a store
with pd.HDFStore('tempstore.h5') as store:
    for nn, chunk in enumerate(pd.read_table(xbed, chunksize=chunksize, sep = sep, index_col= indexcols)):
        group = "x"
        print(chunk.index.names)
        store.append(group, chunk, format = "table", 
                     min_itemsize=dict(zip(chunk.index.names, [32]*len(chunk.index.names))))
        print("chunk #" , nn, file = sys.stderr)
# simple Index -> fails:
sep = "\t"
indexcols =[0]    # <==== this is the only difference
chunksize=5

xbed = "testtable.tab"
%rm 'tempstore.h5'
# create a store
with pd.HDFStore('tempstore.h5') as store:
    for nn, chunk in enumerate(pd.read_table(xbed, chunksize=chunksize, sep = sep, index_col= indexcols)):
        group = "x"
        print(chunk.index.names)
        store.append(group, chunk, format = "table", 
                     min_itemsize=dict(zip(chunk.index.names, [32]*len(chunk.index.names))))
        print("chunk #" , nn, file = sys.stderr)

-> ... ValueError: min_itemsize has the key [V1] which is not an axis or data_column

@jreback
Copy link
Contributor

jreback commented Oct 18, 2015

yeh, this is a dupe of: #10381

min_itemsize = {'index' : 10} should be synonymous with min_itemsize = {index.name : 10}

a pull-request to fix would be welcome (and should be straightforward)

@jreback jreback closed this as completed Oct 18, 2015
@jreback jreback added Bug Duplicate Report Duplicate issue or pull request IO HDF5 read_hdf, HDFStore labels Oct 18, 2015
@DSLituiev
Copy link
Author

can you give a hint where are Table.non_index_axes and Table.index_axes set? I see no explicit assignment.

@jreback
Copy link
Contributor

jreback commented Oct 19, 2015

@DSLituiev
Copy link
Author

thank you!
where are test files located? shall I add a test for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

No branches or pull requests

2 participants