Skip to content

"min_itemsize" doesn't work for MultiIndex columns in table format #12154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
toobaz opened this issue Jan 27, 2016 · 8 comments
Open

"min_itemsize" doesn't work for MultiIndex columns in table format #12154

toobaz opened this issue Jan 27, 2016 · 8 comments
Labels

Comments

@toobaz
Copy link
Member

toobaz commented Jan 27, 2016

df = pd.DataFrame([[1,2,3],[4,5,6]],
              columns=pd.MultiIndex.from_tuples([(1,'a'), (1,'b'), (2,'c')])).astype(str)
store = pd.HDFStore('/tmp/store.hdf')
store.append('test', df, min_itemsize={1 : 20})

yields

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-cea3042a011b> in <module>()
----> 1 store.append('test', df, min_itemsize={1 : 20})

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
    915         kwargs = self._validate_format(format, kwargs)
    916         self._write_to_group(key, value, append=append, dropna=dropna,
--> 917                              **kwargs)
    918 
    919     def append_to_multiple(self, d, value, selector, data_columns=None,

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1260 
   1261         # write the object
-> 1262         s.write(obj=value, append=append, complib=complib, **kwargs)
   1263 
   1264         if s.is_table and index:

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
   3783         self.create_axes(axes=axes, obj=obj, validate=append,
   3784                          min_itemsize=min_itemsize,
-> 3785                          **kwargs)
   3786 
   3787         for a in self.axes:

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3466 
   3467         # validate our min_itemsize
-> 3468         self.validate_min_itemsize(min_itemsize)
   3469 
   3470         # validate our metadata

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.py in validate_min_itemsize(self, min_itemsize)
   3105                 raise ValueError(
   3106                     "min_itemsize has the key [%s] which is not an axis or "
-> 3107                     "data_column" % k)
   3108 
   3109     @property

ValueError: min_itemsize has the key [1] which is not an axis or data_column

... which is actually true (if "data_column" is interpreted as "queryable column" rather than just "column of data"), but should not be a blocker (at least, judging from the documentation).

@toobaz
Copy link
Member Author

toobaz commented Jan 27, 2016

By the way: I did try also min_itemsize={(1, 'a') : 20}, obtaining

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-9a1f06838d41> in <module>()
----> 1 store.append('test', df, min_itemsize={(1, 'a') : 20})

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.pyc in append(self, key, value, format, append, columns, dropna, **kwargs)
    915         kwargs = self._validate_format(format, kwargs)
    916         self._write_to_group(key, value, append=append, dropna=dropna,
--> 917                              **kwargs)
    918 
    919     def append_to_multiple(self, d, value, selector, data_columns=None,

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.pyc in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1260 
   1261         # write the object
-> 1262         s.write(obj=value, append=append, complib=complib, **kwargs)
   1263 
   1264         if s.is_table and index:

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.pyc in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
   3783         self.create_axes(axes=axes, obj=obj, validate=append,
   3784                          min_itemsize=min_itemsize,
-> 3785                          **kwargs)
   3786 
   3787         for a in self.axes:

/home/pietro/nobackup/repo/pandas/pandas/io/pytables.pyc in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3456                 self.values_axes.append(col)
   3457             except (NotImplementedError, ValueError, TypeError) as e:
-> 3458                 raise e
   3459             except Exception as detail:
   3460                 raise Exception(

TypeError: not all arguments converted during string formatting

@jreback jreback added Bug Duplicate Report Duplicate issue or pull request IO HDF5 read_hdf, HDFStore labels Jan 27, 2016
@jreback
Copy link
Contributor

jreback commented Jan 27, 2016

dupe of #10381

@jreback jreback closed this as completed Jan 27, 2016
@toobaz
Copy link
Member Author

toobaz commented Jan 27, 2016

I might be missing something... but here I'm referring to min_itemsize settings for columns in a df where df.columns is a MultiIndex, not to min_itemsize settings for the labels of the (MultiIndex) index of the df. Still if I understand correctly, the required fix is quite different.

@jreback
Copy link
Contributor

jreback commented Jan 27, 2016

is virtually the same fix.

@jreback
Copy link
Contributor

jreback commented Jan 27, 2016

ok, we'll make it a separate issue in any event.

@jreback jreback reopened this Jan 27, 2016
@jreback jreback added this to the Next Major Release milestone Jan 27, 2016
@toobaz
Copy link
Member Author

toobaz commented Jan 27, 2016

Trying to understand if I can help: what should be the recommended API? min_itemsize={1 : 20} or min_itemsize={(1, 'a') : 20, (1,'b') : 20}? Or should they both work?

@jreback
Copy link
Contributor

jreback commented Jan 27, 2016

no I think you would have to explicit (so 2nd form)

@toobaz toobaz mentioned this issue Feb 7, 2016
@toobaz
Copy link
Member Author

toobaz commented Feb 7, 2016

The docs say "Starting in 0.11.0, passing a min_itemsize dict will cause all passed columns to be created as data_columns automatically.", but an error message in pandas/io/pytables.py says "ValueError: cannot use a multi-index on axis [1] with data_columns True".

Should interpret the ValueError as a temporary NotImplementedError? It seems to me that if I'm not allowed to have data_columns=True with MultiIndex columns, then neither I'll be allowed to pass min_itemsize, and this bug is a WONTFIX.

(Right?)

@mroeschke mroeschke removed the Duplicate Report Duplicate issue or pull request label Jan 26, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants