PyTables: any limitation on the number of columns? #7653

bluefir · 2014-07-02T20:47:36Z

I got the following error trying to do

store.append(output_name, data_frame)

for a DataFrame with 2 rows and 3202 columns:

  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 914, in append
    **kwargs)
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 1273, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 3589, in write
    self.set_attrs()
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 2887, in set_attrs
    self.attrs.non_index_axes = self.non_index_axes
  File "C:\Python27\lib\site-packages\tables\attributeset.py", line 455, in __setattr__
    self._g__setattr(name, value)
  File "C:\Python27\lib\site-packages\tables\attributeset.py", line 397, in _g__setattr
    self._g_setattr(self._v_node, name, stvalue)
  File "hdf5extension.pyx", line 704, in tables.hdf5extension.AttributeSet._g_setattr (tables\hdf5extension.c:6832)
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "..\..\hdf5-1.8.11\src\H5A.c", line 254, in H5Acreate2
    unable to create attribute
  File "..\..\hdf5-1.8.11\src\H5A.c", line 503, in H5A_create
    unable to create attribute in object header
  File "..\..\hdf5-1.8.11\src\H5Oattribute.c", line 347, in H5O_attr_create
    unable to create new attribute in header
  File "..\..\hdf5-1.8.11\src\H5Omessage.c", line 224, in H5O_msg_append_real
    unable to create new message
  File "..\..\hdf5-1.8.11\src\H5Omessage.c", line 1945, in H5O_msg_alloc
    unable to allocate space for message
  File "..\..\hdf5-1.8.11\src\H5Oalloc.c", line 1142, in H5O_alloc
    object header message is too large

End of HDF5 error back trace

Can't set attribute 'non_index_axes' in node:
 /alpha_regression_results/liquid/alpha_adv_deciles/factors (Group) u''.

Process finished with exit code 1

Frames with 1402 columns saved just fine. Are there any limitations for the number of columns?

The text was updated successfully, but these errors were encountered:

jreback · 2014-07-02T21:00:30Z

technically no, but the problem is #6245, you are exceeding the amount of meta data allowed per node. (i think its 64kb). #6245 would remove this limit.

side issue. Having lots of columns is very inefficient for retrieval. as any query gets ALL the columns. So store transposed (in this case) is MUCH better. Or split into multiple same-indexed tables (see append_to_multiple/select_from_multiple

bluefir · 2014-07-02T21:27:23Z

Well, unfortunately it's a big refactoring job for me. I am building results in batches (2 rows were just for debugging) in order not to run out of memory and then use store.append() to add to previously saved results. So, transposing will not work. append_to_multiple/select_from_multiple looks like a pain, but thanks for the tip anyway! Oh, well...

jreback · 2014-07-02T21:29:51Z

you can try shortening column names (painful but maybe less so)

bluefir · 2014-07-03T17:50:23Z

I am trying to use stack() and running into the following problem:

df.columns.nlevels
3

df.stack(level=[1, 2])
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-62-c133879e3675> in <module>()
----> 1 df.stack(level=[1, 2])

C:\Python27\lib\site-packages\pandas\core\frame.pyc in stack(self, level, dropna)
   3153             result = self
   3154             for lev in level:
-> 3155                 result = stack(result, lev, dropna=dropna)
   3156             return result
   3157         else:

C:\Python27\lib\site-packages\pandas\core\reshape.pyc in stack(frame, level, dropna)
    501         level += frame.columns.nlevels
    502 
--> 503     level = frame.columns._get_level_number(level)
    504 
    505     if isinstance(frame.columns, MultiIndex):

C:\Python27\lib\site-packages\pandas\core\index.pyc in _get_level_number(self, level)
   2203             elif level >= self.nlevels:
   2204                 raise IndexError('Too many levels: Index has only %d levels, '
-> 2205                                  'not %d' % (self.nlevels, level + 1))
   2206         return level
   2207 

IndexError: Too many levels: Index has only 2 levels, not 3

jreback · 2014-07-03T17:51:44Z

pls show a copy-past able example

bluefir · 2014-07-03T17:55:04Z

df = DataFrame(np.random.randn(1000, 27), columns=MultiIndex.from_tuples(list(itertools.product(xrange(3), repeat=3))))
df.stack(level=[1, 2])

bluefir · 2014-07-03T18:58:35Z

By the way, unstack(level=[1, 2]) works:

test = df
for _ in xrange(2):
    test = test.stack(1)
test.unstack(level=[1, 2])

jreback · 2014-07-03T19:37:34Z

if you do:

df.stack(level=1).stack(level=1) this works, I'll be its getting confusing on the level number when doing multiples.

marking as a bug

jreback · 2014-07-03T19:40:30Z

going to close this issue in favor of #6245 (and #7660). thanks for the reports

jreback added HDF5 labels Jul 2, 2014

jreback mentioned this issue Jul 3, 2014

BUG: stacking multiple levels #7660

Closed

jreback closed this as completed Jul 3, 2014

onesandzeroes mentioned this issue Jul 16, 2014

BUG: Fix for passing multiple ints as levels in DataFrame.stack() (#7660) #7770

Merged

collijk mentioned this issue Sep 7, 2018

Switch metrics to long storage format to overcome column limitation ihmeuw/vivarium#43

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTables: any limitation on the number of columns? #7653

PyTables: any limitation on the number of columns? #7653

bluefir commented Jul 2, 2014

jreback commented Jul 2, 2014

bluefir commented Jul 2, 2014

jreback commented Jul 2, 2014

bluefir commented Jul 3, 2014

jreback commented Jul 3, 2014

bluefir commented Jul 3, 2014

bluefir commented Jul 3, 2014

jreback commented Jul 3, 2014

jreback commented Jul 3, 2014

PyTables: any limitation on the number of columns? #7653

PyTables: any limitation on the number of columns? #7653

Comments

bluefir commented Jul 2, 2014

jreback commented Jul 2, 2014

bluefir commented Jul 2, 2014

jreback commented Jul 2, 2014

bluefir commented Jul 3, 2014

jreback commented Jul 3, 2014

bluefir commented Jul 3, 2014

bluefir commented Jul 3, 2014

jreback commented Jul 3, 2014

jreback commented Jul 3, 2014