Skip to content

Minor doc issue in to_parquet(engine='auto') #18156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
max-sixty opened this issue Nov 7, 2017 · 3 comments · Fixed by #18717
Closed

Minor doc issue in to_parquet(engine='auto') #18156

max-sixty opened this issue Nov 7, 2017 · 3 comments · Fixed by #18717
Labels
Error Reporting Incorrect or improved errors from pandas IO Parquet parquet, feather
Milestone

Comments

@max-sixty
Copy link
Contributor

Code Sample, a copy-pastable example if possible

In [7]: pd.DataFrame.to_parquet?
Signature: pd.DataFrame.to_parquet(self, fname, engine='auto', compression='snappy', **kwargs)
Docstring:
Write a DataFrame to the binary parquet format.

.. versionadded:: 0.21.0

Parameters
----------
fname : str
    string file path
engine : {'auto', 'pyarrow', 'fastparquet'}, default 'auto' <--- here

but with auto:

In [9]: pd.DataFrame([1,2,3]).to_parquet('f', engine='auto')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-057ba592ccd7> in <module>()
----> 1 pd.DataFrame([1,2,3]).to_parquet('f', engine='auto')

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py in to_parquet(self, fname, engine, compression, **kwargs)
   1647         from pandas.io.parquet import to_parquet
   1648         to_parquet(self, fname, engine,
-> 1649                    compression=compression, **kwargs)
   1650
   1651     @Substitution(header='Write out the column names. If a list of strings '

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/parquet.py in to_parquet(df, path, engine, compression, **kwargs)
    140     """
    141
--> 142     impl = get_engine(engine)
    143
    144     if not isinstance(df, DataFrame):

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/parquet.py in get_engine(engine)
     27
     28     if engine not in ['pyarrow', 'fastparquet']:
---> 29         raise ValueError("engine must be one of 'pyarrow', 'fastparquet'")
     30
     31     if engine == 'pyarrow':

ValueError: engine must be one of 'pyarrow', 'fastparquet'

Problem description

Either allow auto or don't have it in the docs & default

Expected Output

Output of pd.show_versions()

In [10]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 17.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.0
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.5.0
Cython: 0.27.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: 0.10.0rc1-2-gf83361c
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.4
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: 1.3.0
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.1.14
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: 0.2.0+8.g9eb9d77
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Nov 7, 2017

you don't have any deps installed (e.g. either pyarrow or fastparquet). I suppose it could be a better message, but it is correct.

@jreback
Copy link
Contributor

jreback commented Nov 7, 2017

if you are explicit

In [2]: pd.DataFrame([1,2,3]).to_parquet('f', engine='fastparquet')
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
~/pandas/pandas/io/parquet.py in __init__(self)
     90         try:
---> 91             import fastparquet
     92         except ImportError:

ModuleNotFoundError: No module named 'fastparquet'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
<ipython-input-2-447fb0c0c58a> in <module>()
----> 1 pd.DataFrame([1,2,3]).to_parquet('f', engine='fastparquet')

~/pandas/pandas/core/frame.py in to_parquet(self, fname, engine, compression, **kwargs)
   1643         from pandas.io.parquet import to_parquet
   1644         to_parquet(self, fname, engine,
-> 1645                    compression=compression, **kwargs)
   1646 
   1647     @Substitution(header='Write out the column names. If a list of strings '

~/pandas/pandas/io/parquet.py in to_parquet(df, path, engine, compression, **kwargs)
    140     """
    141 
--> 142     impl = get_engine(engine)
    143 
    144     if not isinstance(df, DataFrame):

~/pandas/pandas/io/parquet.py in get_engine(engine)
     32         return PyArrowImpl()
     33     elif engine == 'fastparquet':
---> 34         return FastParquetImpl()
     35 
     36 

~/pandas/pandas/io/parquet.py in __init__(self)
     91             import fastparquet
     92         except ImportError:
---> 93             raise ImportError("fastparquet is required for parquet support\n\n"
     94                               "you can install via conda\n"
     95                               "conda install fastparquet -c conda-forge\n"

ImportError: fastparquet is required for parquet support

you can install via conda
conda install fastparquet -c conda-forge

or via pip
pip install -U fastparquet

since I have pyarrow installed (your DF cannot be serialized anyhow)

In [1]: pd.DataFrame([1,2,3]).to_parquet('f', engine='auto')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-14db7c4661b9> in <module>()
----> 1 pd.DataFrame([1,2,3]).to_parquet('f', engine='auto')

~/pandas/pandas/core/frame.py in to_parquet(self, fname, engine, compression, **kwargs)
   1643         from pandas.io.parquet import to_parquet
   1644         to_parquet(self, fname, engine,
-> 1645                    compression=compression, **kwargs)
   1646 
   1647     @Substitution(header='Write out the column names. If a list of strings '

~/pandas/pandas/io/parquet.py in to_parquet(df, path, engine, compression, **kwargs)
    174     # must have value column names (strings only)
    175     if df.columns.inferred_type not in valid_types:
--> 176         raise ValueError("parquet must have string column names")
    177 
    178     return impl.write(df, path, compression=compression)

ValueError: parquet must have string column names

@jreback jreback added IO Parquet parquet, feather Difficulty Novice Error Reporting Incorrect or improved errors from pandas labels Nov 7, 2017
@jreback jreback added this to the 0.21.1 milestone Nov 7, 2017
@max-sixty
Copy link
Contributor Author

Ah, yes, I pulled the example on an environment without them installed. Thanks for pulling the correct error messages

@jreback jreback modified the milestones: 0.21.1, Next Major Release Dec 2, 2017
@jorisvandenbossche jorisvandenbossche modified the milestones: Next Major Release, 0.21.1 Dec 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas IO Parquet parquet, feather
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants