BUG: Bad error message on read_parquet() when wrong version of pyarrow is installed #33313

jfcorbett · 2020-04-06T09:53:19Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

With pyarrow<0.13 installed,

pd.read_parquet('some_file.parquet')

Problem description

The error message says that pyarrow should be installed, even though pyarrow is already installed – which makes the error difficult to diagnose and fix, when in fact, the problem is an incompatible version of pyarrow.

  File "[...]\lib\site-packages\pandas\io\parquet.py", line 33, in get_engine
    "Unable to find a usable engine; "
ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
pyarrow or fastparquet is required for parquet support

Expected Output

The message should state the real cause of the error, namely an incompatible version of pyarrow. Like:

  File "[...]\lib\site-packages\pandas\io\parquet.py", line 40, in get_engine
    return PyArrowImpl()
  File "[...]\lib\site-packages\pandas\io\parquet.py", line 75, in __init__
    "pyarrow", extra="pyarrow is required for parquet support."
  File "[...]\lib\site-packages\pandas\compat\_optional.py", line 109, in import_optional_dependency
    raise ImportError(msg)
ImportError: Pandas requires version '0.13.0' or newer of 'pyarrow' (version '0.11.1' currently installed).

This helpful error message does in fact get generated at a lower level in pandas.compat._optional.import_optional_dependency(), but then gets swallowed when its client pandas.io.parquet.get_engine(engine="auto") raises a new ImportError from scratch.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : None.None

pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.1.3.post20200330
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.11.1
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-04-06T15:24:48Z

@jfcorbett thanks for the report! That error message could indeed be improved.

Contributions welcome!

quangngd · 2020-04-07T05:58:49Z

take

jfcorbett · 2020-04-07T09:51:46Z

@jorisvandenbossche Done! ^^^

@quangngd Sorry if I jumped the gun! first-time contributor.

Let me try that "take" command... curious

jfcorbett · 2020-04-07T09:54:05Z

take

jfcorbett added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2020

jorisvandenbossche added IO Parquet parquet, feather and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2020

jorisvandenbossche added this to the Contributions Welcome milestone Apr 6, 2020

github-actions bot assigned quangngd Apr 7, 2020

jfcorbett mentioned this issue Apr 7, 2020

Fix read parquet import error message #33361

Merged

5 tasks

jfcorbett changed the title ~~BUG: Bad error message on read_parquet() with wrong version of pyarrow is installed~~ BUG: Bad error message on read_parquet() when wrong version of pyarrow is installed Apr 7, 2020

github-actions bot assigned jfcorbett Apr 7, 2020

quangngd removed their assignment Apr 7, 2020

jreback modified the milestones: Contributions Welcome, 1.1 Apr 7, 2020

jreback added Error Reporting Incorrect or improved errors from pandas and removed Bug labels Apr 7, 2020

jreback closed this as completed in #33361 Apr 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Bad error message on read_parquet() when wrong version of pyarrow is installed #33313

BUG: Bad error message on read_parquet() when wrong version of pyarrow is installed #33313

jfcorbett commented Apr 6, 2020

INSTALLED VERSIONS

jorisvandenbossche commented Apr 6, 2020

quangngd commented Apr 7, 2020

jfcorbett commented Apr 7, 2020 •

edited

Loading

jfcorbett commented Apr 7, 2020

BUG: Bad error message on read_parquet() when wrong version of pyarrow is installed #33313

BUG: Bad error message on read_parquet() when wrong version of pyarrow is installed #33313

Comments

jfcorbett commented Apr 6, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Apr 6, 2020

quangngd commented Apr 7, 2020

jfcorbett commented Apr 7, 2020 • edited Loading

jfcorbett commented Apr 7, 2020

Output of `pd.show_versions()`

jfcorbett commented Apr 7, 2020 •

edited

Loading