Skip to content

BUG df with 'index' as one name of a MultiIndex fails to save as HDFStore table #6208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
glyg opened this issue Jan 31, 2014 · 2 comments
Open
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore

Comments

@glyg
Copy link
Contributor

glyg commented Jan 31, 2014

This one is strange...
Here is a minimal example:

index = pd.MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
                              ['one', 'two', 'three']],
                      codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
                              [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
                      names=['index', 'bar_name'])


df_mi = pd.DataFrame(np.random.randn(10, 3), index=index,
                     columns=['A', 'B', 'C'])

with pd.HDFStore('minimal_io.h5', mode="w") as store:
    store.put('df_mi', df_mi, format='table')

And the error backtrace:

--------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-bb85f423e84c> in <module>()
     11 
     12 with pd.get_store('minimal_io.h5') as store:
---> 13     store.put('df_mi', df_mi, format='table')

/home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py in put(self, key, value, format, append, **kwargs)
    819             format = get_option("io.hdf.default_format") or 'fixed'
    820         kwargs = self._validate_format(format, kwargs)
--> 821         self._write_to_group(key, value, append=append, **kwargs)
    822 
    823     def remove(self, key, where=None, start=None, stop=None):

/home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1271 
   1272         # write the object
-> 1273         s.write(obj=value, append=append, complib=complib, **kwargs)
   1274 
   1275         if s.is_table and index:

/home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py in write(self, obj, data_columns, **kwargs)
   3963         print(data_columns)
   3964         return super(AppendableMultiFrameTable, self).write(
-> 3965             obj=obj, data_columns=data_columns, **kwargs)
   3966 
   3967     def read(self, **kwargs):

/home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
   3603 
   3604         # add the rows
-> 3605         self.write_data(chunksize, dropna=dropna)
   3606 
   3607     def write_data(self, chunksize, dropna=True):

/home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py in write_data(self, chunksize, dropna)
   3661         for i, v in enumerate(values):
   3662             new_shape = (nrows,) + self.dtype[names[nindexes + i]].shape
-> 3663             bvalues.append(values[i].ravel().reshape(new_shape))
   3664 
   3665         # write the chunks

ValueError: total size of new array must be unchanged

> /home/guillaume/python3/lib/python3.3/site-packages/pandas-0.13.0_496_ga2d5e53-py3.3-linux-x86_64.egg/pandas/io/pytables.py(3663)write_data()
   3662             new_shape = (nrows,) + self.dtype[names[nindexes + i]].shape
-> 3663             bvalues.append(values[i].ravel().reshape(new_shape))
   3664 

pandas version: '0.13.0-496-ga2d5e53'

Note that the bug is not there if the format is set to 'fixed', i.e. this works:

with pd.get_store('minimal_io.h5') as store:
    store.put('df_mi', df_mi, format='fixed')
@jreback
Copy link
Contributor

jreback commented Jan 31, 2014

not allowed to use 'index' as a level name in a MuliIndex when storing, because 'index' is 'reserved'. This should check for this and raise in validate_multindex and just raise a ValueError; too complicated to fix

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 18, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@datapythonista datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018
@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas and removed Bug labels Sep 29, 2019
@mroeschke mroeschke added the Bug label May 16, 2020
@mroeschke mroeschke removed this from the Someday milestone Oct 13, 2022
@phofl
Copy link
Member

phofl commented Apr 18, 2023

Edit: correction, wrong example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

No branches or pull requests

6 participants