inconisitent behaviour of hdf5 store.append for MultiIndex vs simple Index #11364

DSLituiev · 2015-10-18T21:20:58Z

The following example works with MultiIndex, but fails with simple index, which is bad in terms of generalization / intuitive behaviour:

#Create example:
col_nums = [0]
df = pd.DataFrame({"V1":["a","b","c","d","e", "aaaah!!!"], 
                          "W":["c","d","c","d","c","c"],
                          "data":np.arange(6)})
df.set_index(["V1","W"], inplace = True)
df.to_csv("testtable.tab",sep = "\t")

# MulitIndex -> works:
sep = "\t"
indexcols =[0,1]
chunksize=5

xbed = "testtable.tab"
%rm 'tempstore.h5'
# create a store
with pd.HDFStore('tempstore.h5') as store:
    for nn, chunk in enumerate(pd.read_table(xbed, chunksize=chunksize, sep = sep, index_col= indexcols)):
        group = "x"
        print(chunk.index.names)
        store.append(group, chunk, format = "table", 
                     min_itemsize=dict(zip(chunk.index.names, [32]*len(chunk.index.names))))
        print("chunk #" , nn, file = sys.stderr)

# simple Index -> fails:
sep = "\t"
indexcols =[0]    # <==== this is the only difference
chunksize=5

xbed = "testtable.tab"
%rm 'tempstore.h5'
# create a store
with pd.HDFStore('tempstore.h5') as store:
    for nn, chunk in enumerate(pd.read_table(xbed, chunksize=chunksize, sep = sep, index_col= indexcols)):
        group = "x"
        print(chunk.index.names)
        store.append(group, chunk, format = "table", 
                     min_itemsize=dict(zip(chunk.index.names, [32]*len(chunk.index.names))))
        print("chunk #" , nn, file = sys.stderr)

-> ... ValueError: min_itemsize has the key [V1] which is not an axis or data_column

The text was updated successfully, but these errors were encountered:

jreback · 2015-10-18T23:20:02Z

yeh, this is a dupe of: #10381

min_itemsize = {'index' : 10} should be synonymous with min_itemsize = {index.name : 10}

a pull-request to fix would be welcome (and should be straightforward)

DSLituiev · 2015-10-19T02:58:01Z

can you give a hint where are Table.non_index_axes and Table.index_axes set? I see no explicit assignment.

jreback · 2015-10-19T03:13:06Z

https://github.com/pydata/pandas/blob/master/pandas/io/pytables.py#L3347

DSLituiev · 2015-10-19T05:50:27Z

thank you!
where are test files located? shall I add a test for this issue?

jreback closed this as completed Oct 18, 2015

jreback added Bug Duplicate Report Duplicate issue or pull request IO HDF5 read_hdf, HDFStore labels Oct 18, 2015

DSLituiev added a commit to DSLituiev/pandas that referenced this issue Oct 19, 2015

fixes pandas-dev#11364, pandas-dev#10381

6a7bf9f

DSLituiev added a commit to DSLituiev/pandas that referenced this issue Oct 19, 2015

test for issue pandas-dev#11364

0d6760e

jreback mentioned this issue Oct 22, 2015

min_itemsize not working on MultiIndex columns for Series, with format="table" #11412

Closed

toobaz mentioned this issue Nov 24, 2016

BUG: Ensure min_itemsize is always a list (#11412) #14728

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconisitent behaviour of hdf5 store.append for MultiIndex vs simple Index #11364

inconisitent behaviour of hdf5 store.append for MultiIndex vs simple Index #11364

DSLituiev commented Oct 18, 2015

jreback commented Oct 18, 2015

DSLituiev commented Oct 19, 2015

jreback commented Oct 19, 2015

DSLituiev commented Oct 19, 2015

inconisitent behaviour of hdf5 store.append for MultiIndex vs simple Index #11364

inconisitent behaviour of hdf5 store.append for MultiIndex vs simple Index #11364

Comments

DSLituiev commented Oct 18, 2015

jreback commented Oct 18, 2015

DSLituiev commented Oct 19, 2015

jreback commented Oct 19, 2015

DSLituiev commented Oct 19, 2015