Skip to content

Cannot understand ValueError for pandas.HDFStore.select(), HDF5 from Pytables #14568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
evanbiederstedt opened this issue Nov 2, 2016 · 4 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@evanbiederstedt
Copy link

This is a standard Python error, but I don't understand it in this context:

I have an HDF5 object I create with Pytables. Printing out the information in Python with >>> store, I see the following information

import pandas as pd
store = pd.HDFStore("file.h5")
>>> store 

<class 'pandas.io.pytables.HDFStore'>
File path: /path/pathname/file.h5
given_key           frame_table [0.0.0] (typ->generic,nrows->877129127,ncols->6,indexers->[index],dc->[COL1, COL2, COL3, COL4, COL5, COL6])

So, there are 6 columns and 877 million rows. I indexed on COL1 and COl6. 

Now, I'm getting errors when I try to select values.

df = store.select(key ="given_key", columns = "COL1", start=0, stop=1)  

This outputs the error:

ValueError: Shape of passed values is (1, 1), indices imply (1, 877129127)

So I try this:

df = store.select(key ="given_key", start=0, stop=1)  

I get the same ValueError

ValueError: Shape of passed values is (1, 1), indices imply (1, 877129127)

What does this mean? Is there an error in terms of how I created this HDF5 object?

Let's say I want a value which I know exists in COL1

df = store.select("given_key", where = "COL1 == 'abc'")

Here's the error I get:

ValueError: Shape of passed values is (1, 1231), indices imply (1, 877129127)

It looks like it has accessed 1231 rows with this query value (which makes sense) but I cannot....get it.

How can I fix this?

@TomAugspurger
Copy link
Contributor

I have an HDF5 object I create with Pytables

Can you give more information about how the table was created, preferably a small script that actually creates the table. I don't know if pandas can read arbitrary HDF5 tables.

@evanbiederstedt
Copy link
Author

@TomAugspurger The script is fairly extensive, but it's fundamentally based on Particle here: http://www.pytables.org/usersguide/tutorials.html

Is it possible to e-mail the script/share secret gitst?

@TomAugspurger
Copy link
Contributor

A small script that reproduces the problem would be preferable. I'm guessing there's nothing special about your data that manifests the problem. For example, just running https://raw.githubusercontent.com/PyTables/PyTables/develop/examples/tutorial1-1.py reproduces it partly.

In [1]: %run tutorial1-1.py

# <snip>

File 'tutorial1.h5' created

In [2]: store = pd.HDFStore("tutorial1.h5")

In [3]: store.select(key="/detector/readout", start=0, stop=1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-5a5fae2cba89> in <module>()
----> 1 store.select(key="/detector/readout", start=0, stop=1)

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/io/pytables.py in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
    722                            chunksize=chunksize, auto_close=auto_close)
    723 
--> 724         return it.get_result()
    725 
    726     def select_as_coordinates(

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/io/pytables.py in get_result(self, coordinates)
   1421 
   1422         # directly return the result
-> 1423         results = self.func(self.start, self.stop, where)
   1424         self.close()
   1425         return results

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/io/pytables.py in func(_start, _stop, _where)
    715             return s.read(start=_start, stop=_stop,
    716                           where=_where,
--> 717                           columns=columns, **kwargs)
    718 
    719         # create the iterator

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/io/pytables.py in read(self, where, columns, **kwargs)
   4118 
   4119             block = make_block(values, placement=np.arange(len(cols_)))
-> 4120             mgr = BlockManager([block], [cols_, index_])
   4121             frames.append(DataFrame(mgr))
   4122 

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2760 
   2761         if do_integrity_check:
-> 2762             self._verify_integrity()
   2763 
   2764         self._consolidate_check()

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/core/internals.py in _verify_integrity(self)
   2970         for block in self.blocks:
   2971             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 2972                 construction_error(tot_items, block.shape[1:], self.axes)
   2973         if len(self.items) != tot_items:
   2974             raise AssertionError('Number of manager items must equal union of '

/Users/tom.augspurger/Envs/py3/lib/python3.5/site-packages/pandas/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4230         raise ValueError("Empty data passed with indices specified.")
   4231     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4232         passed, implied))
   4233 
   4234 

ValueError: Shape of passed values is (1, 1), indices imply (1, 10)

@TomAugspurger
Copy link
Contributor

At this point, this looks like a dupe of #11188
let me know if you think otherwise. There's a nice reproducible example there, so I'm closing in favor of that.

And if you're interested, please do submit a PR with a fix!

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Nov 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants