Skip to content

DOC/BUG: broken example in read_parquet with selecting columns #18628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Dec 4, 2017 · 3 comments · Fixed by #26194
Closed

DOC/BUG: broken example in read_parquet with selecting columns #18628

jorisvandenbossche opened this issue Dec 4, 2017 · 3 comments · Fixed by #26194
Labels
Compat pandas objects compatability with Numpy or Python functions Docs IO Parquet parquet, feather
Milestone

Comments

@jorisvandenbossche
Copy link
Member

In the dev docs the example that subsets the columns to read with read_parquet is broken for the pyarrow engine: http://pandas-docs.github.io/pandas-docs-travis/io.html#io-parquet

In [514]: result = pd.read_parquet('example_pa.parquet', engine='pyarrow', columns=['a', 'b'])
...
IndexError: Table column index 6 is out of range

In [515]: result = pd.read_parquet('example_fp.parquet', engine='fastparquet', columns=['a', 'b'])

In [516]: result.dtypes
Out[516]: 
a    object
b     int64
dtype: object

This is due to a bug in pyarrow (which I am reporting over there, due to how pyarrow deals with the pandas metadata if not all columns are present), but in the meantime we should also fix our docs to not show this buggy example.

@jorisvandenbossche jorisvandenbossche added Compat pandas objects compatability with Numpy or Python functions Docs IO Parquet parquet, feather labels Dec 4, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.21.1 milestone Dec 4, 2017
@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Dec 4, 2017

Issue is here: https://issues.apache.org/jira/browse/ARROW-1883 and PR here: apache/arrow#1386

@jorisvandenbossche
Copy link
Member Author

And we maybe should also add a test for this case

@jorisvandenbossche
Copy link
Member Author

The docs are updated to not include this, so removing the 0.21.1 milestone, we let's keep this open to make sure to revert the PR once pyarrow 0.8.0 is released.

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.21.1, 0.22.0 Dec 7, 2017
@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Apr 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Docs IO Parquet parquet, feather
Projects
None yet
2 participants