Clarify documentation on numexpr in eval / query #12008

kyleabeauchamp · 2016-01-09T19:01:04Z

So the query function calls eval under the hood, which ends up meaning that the function call ends failing with the following when numexpr is not present. This means that the default engine (numexpr) will lead to error on systems without having extra dependencies present.

ImportError: 'numexpr' not found. Cannot use engine='numexpr'

I see four solutions, which I list in increasing order of difficulty:

Make the numexpr dependency much clearer in the docstrings. Explicitly say that an ImportError will occur if you use the default arguments without having numexpr installed.
Make the python engine default
Let the numexpr engine fallback on the python engine in the case of import error
Make numexpr a required dependency

edit It appears that option (3) is preferred here.

The text was updated successfully, but these errors were encountered:

jreback · 2016-01-09T20:03:56Z

can you show pd.show_versions() and what you are executing.

This should fall back to use the python engine if numexpr is not installed.

kyleabeauchamp · 2016-01-09T20:06:09Z

Below is the output after I installed numexpr. This is all based on the latest miniconda2 version.


In [1]: import pandas as pd
pd
In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-71-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 19.1.1
Cython: 0.23.4
numpy: 1.10.2
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None

kyleabeauchamp · 2016-01-09T20:10:37Z

I just did a conda remove numexpr and reproduced the error with the following:

import pandas as pd

X = [dict(A=1, B=2), dict(A=5, B=6), dict(A=10, B=20)]
X = pd.DataFrame(X)


Y = X.query("A>1")

[...]
ImportError: 'numexpr' not found. Cannot use engine='numexpr' for query/eval if 'numexpr' is not installed


In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 19.1.1
Cython: 0.23.4
numpy: 1.10.2
scipy: 0.16.1
statsmodels: 0.6.1
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
Jinja2: None

jreback · 2016-01-09T20:16:03Z

ok, prob need some fallback logic then.

you can have a look in pandas/computation/eval.py. We can make the default of .eval(..., engine=None). then depending on if numexpr is installed can select an appropriate engine.

I suspect this is skipped in testing now because since numexpr is NOT installed the tests just skip over this. so might need some tweaking to test it on Travis.

lmk.

Fixes pandas-dev#12008

gliptak · 2016-04-13T01:07:22Z

I reviewed pandas/computation/tests/test_eval.py and pandas/tests/frame/test_query_eval.py

test_eval.py references:

parsers = 'python', 'pytables', 'pandas'
engines = 'numexpr', 'python'

while test_query_eval.py references:

PARSERS = 'python', 'pandas'
ENGINES = 'python', 'numexpr'

Would pytables also be applicable to test_query_eval.py?
Is there a desired hierarchy (as in use pytables, `pandasandpython`` last)?
Should all tests work for all combinations of these (with some really specific exceptions)?

(I didn't find good references in the documentation on these above)

jreback · 2016-04-13T01:15:51Z

oh I didn't realize we had another issue about this; was fixed here

just recently. no need to clarify the docs as we do the auto fallback.

jreback · 2016-04-13T01:16:36Z

#12867 is still applicable though

jreback added the Compat pandas objects compatability with Numpy or Python functions label Jan 9, 2016

jreback added Bug Difficulty Novice labels Jan 9, 2016

jreback added this to the Next Major Release milestone Jan 9, 2016

robintw mentioned this issue Mar 13, 2016

Chooses eval engine automatically depending on whether numexpr is installed #12605

Closed

robintw added a commit to robintw/pandas that referenced this issue Mar 13, 2016

Automatically chooses eval engine based on whether numexpr is installed

ec6a25b

Fixes pandas-dev#12008

jreback modified the milestones: 0.18.1, Next Major Release Mar 13, 2016

jreback closed this as completed Apr 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify documentation on numexpr in eval / query #12008

Clarify documentation on numexpr in eval / query #12008

kyleabeauchamp commented Jan 9, 2016

jreback commented Jan 9, 2016

kyleabeauchamp commented Jan 9, 2016

kyleabeauchamp commented Jan 9, 2016

jreback commented Jan 9, 2016

gliptak commented Apr 13, 2016

jreback commented Apr 13, 2016

jreback commented Apr 13, 2016

Clarify documentation on numexpr in eval / query #12008

Clarify documentation on numexpr in eval / query #12008

Comments

kyleabeauchamp commented Jan 9, 2016

jreback commented Jan 9, 2016

kyleabeauchamp commented Jan 9, 2016

kyleabeauchamp commented Jan 9, 2016

jreback commented Jan 9, 2016

gliptak commented Apr 13, 2016

jreback commented Apr 13, 2016

jreback commented Apr 13, 2016