Skip to content

PyTables: any limitation on the number of columns? #7653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bluefir opened this issue Jul 2, 2014 · 9 comments
Closed

PyTables: any limitation on the number of columns? #7653

bluefir opened this issue Jul 2, 2014 · 9 comments
Labels
IO HDF5 read_hdf, HDFStore Performance Memory or execution speed performance

Comments

@bluefir
Copy link

bluefir commented Jul 2, 2014

I got the following error trying to do

store.append(output_name, data_frame)

for a DataFrame with 2 rows and 3202 columns:

  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 914, in append
    **kwargs)
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 1273, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 3589, in write
    self.set_attrs()
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 2887, in set_attrs
    self.attrs.non_index_axes = self.non_index_axes
  File "C:\Python27\lib\site-packages\tables\attributeset.py", line 455, in __setattr__
    self._g__setattr(name, value)
  File "C:\Python27\lib\site-packages\tables\attributeset.py", line 397, in _g__setattr
    self._g_setattr(self._v_node, name, stvalue)
  File "hdf5extension.pyx", line 704, in tables.hdf5extension.AttributeSet._g_setattr (tables\hdf5extension.c:6832)
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "..\..\hdf5-1.8.11\src\H5A.c", line 254, in H5Acreate2
    unable to create attribute
  File "..\..\hdf5-1.8.11\src\H5A.c", line 503, in H5A_create
    unable to create attribute in object header
  File "..\..\hdf5-1.8.11\src\H5Oattribute.c", line 347, in H5O_attr_create
    unable to create new attribute in header
  File "..\..\hdf5-1.8.11\src\H5Omessage.c", line 224, in H5O_msg_append_real
    unable to create new message
  File "..\..\hdf5-1.8.11\src\H5Omessage.c", line 1945, in H5O_msg_alloc
    unable to allocate space for message
  File "..\..\hdf5-1.8.11\src\H5Oalloc.c", line 1142, in H5O_alloc
    object header message is too large

End of HDF5 error back trace

Can't set attribute 'non_index_axes' in node:
 /alpha_regression_results/liquid/alpha_adv_deciles/factors (Group) u''.

Process finished with exit code 1

Frames with 1402 columns saved just fine. Are there any limitations for the number of columns?

@jreback
Copy link
Contributor

jreback commented Jul 2, 2014

technically no, but the problem is #6245, you are exceeding the amount of meta data allowed per node. (i think its 64kb). #6245 would remove this limit.

side issue. Having lots of columns is very inefficient for retrieval. as any query gets ALL the columns. So store transposed (in this case) is MUCH better. Or split into multiple same-indexed tables (see append_to_multiple/select_from_multiple

@bluefir
Copy link
Author

bluefir commented Jul 2, 2014

Well, unfortunately it's a big refactoring job for me. I am building results in batches (2 rows were just for debugging) in order not to run out of memory and then use store.append() to add to previously saved results. So, transposing will not work. append_to_multiple/select_from_multiple looks like a pain, but thanks for the tip anyway! Oh, well...

@jreback
Copy link
Contributor

jreback commented Jul 2, 2014

you can try shortening column names (painful but maybe less so)

@bluefir
Copy link
Author

bluefir commented Jul 3, 2014

I am trying to use stack() and running into the following problem:

df.columns.nlevels
3

df.stack(level=[1, 2])
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-62-c133879e3675> in <module>()
----> 1 df.stack(level=[1, 2])

C:\Python27\lib\site-packages\pandas\core\frame.pyc in stack(self, level, dropna)
   3153             result = self
   3154             for lev in level:
-> 3155                 result = stack(result, lev, dropna=dropna)
   3156             return result
   3157         else:

C:\Python27\lib\site-packages\pandas\core\reshape.pyc in stack(frame, level, dropna)
    501         level += frame.columns.nlevels
    502 
--> 503     level = frame.columns._get_level_number(level)
    504 
    505     if isinstance(frame.columns, MultiIndex):

C:\Python27\lib\site-packages\pandas\core\index.pyc in _get_level_number(self, level)
   2203             elif level >= self.nlevels:
   2204                 raise IndexError('Too many levels: Index has only %d levels, '
-> 2205                                  'not %d' % (self.nlevels, level + 1))
   2206         return level
   2207 

IndexError: Too many levels: Index has only 2 levels, not 3

@jreback
Copy link
Contributor

jreback commented Jul 3, 2014

pls show a copy-past able example

@bluefir
Copy link
Author

bluefir commented Jul 3, 2014

df = DataFrame(np.random.randn(1000, 27), columns=MultiIndex.from_tuples(list(itertools.product(xrange(3), repeat=3))))
df.stack(level=[1, 2])

@bluefir
Copy link
Author

bluefir commented Jul 3, 2014

By the way, unstack(level=[1, 2]) works:

test = df
for _ in xrange(2):
    test = test.stack(1)
test.unstack(level=[1, 2])

@jreback
Copy link
Contributor

jreback commented Jul 3, 2014

if you do:

df.stack(level=1).stack(level=1) this works, I'll be its getting confusing on the level number when doing multiples.

marking as a bug

@jreback
Copy link
Contributor

jreback commented Jul 3, 2014

going to close this issue in favor of #6245 (and #7660). thanks for the reports

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HDF5 read_hdf, HDFStore Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

2 participants