Skip to content

Clarify documentation on numexpr in eval / query #12008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kyleabeauchamp opened this issue Jan 9, 2016 · 7 comments
Closed

Clarify documentation on numexpr in eval / query #12008

kyleabeauchamp opened this issue Jan 9, 2016 · 7 comments
Labels
Bug Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@kyleabeauchamp
Copy link

So the query function calls eval under the hood, which ends up meaning that the function call ends failing with the following when numexpr is not present. This means that the default engine (numexpr) will lead to error on systems without having extra dependencies present.

ImportError: 'numexpr' not found. Cannot use engine='numexpr' 

I see four solutions, which I list in increasing order of difficulty:

  1. Make the numexpr dependency much clearer in the docstrings. Explicitly say that an ImportError will occur if you use the default arguments without having numexpr installed.
  2. Make the python engine default
  3. Let the numexpr engine fallback on the python engine in the case of import error
  4. Make numexpr a required dependency

edit It appears that option (3) is preferred here.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2016

can you show pd.show_versions() and what you are executing.

This should fall back to use the python engine if numexpr is not installed.

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Jan 9, 2016
@kyleabeauchamp
Copy link
Author

Below is the output after I installed numexpr. This is all based on the latest miniconda2 version.


In [1]: import pandas as pd
pd
In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-71-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 19.1.1
Cython: 0.23.4
numpy: 1.10.2
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None

@kyleabeauchamp
Copy link
Author

I just did a conda remove numexpr and reproduced the error with the following:

import pandas as pd

X = [dict(A=1, B=2), dict(A=5, B=6), dict(A=10, B=20)]
X = pd.DataFrame(X)


Y = X.query("A>1")

[...]
ImportError: 'numexpr' not found. Cannot use engine='numexpr' for query/eval if 'numexpr' is not installed


In [2]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 19.1.1
Cython: 0.23.4
numpy: 1.10.2
scipy: 0.16.1
statsmodels: 0.6.1
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
Jinja2: None



@jreback
Copy link
Contributor

jreback commented Jan 9, 2016

ok, prob need some fallback logic then.

you can have a look in pandas/computation/eval.py. We can make the default of .eval(..., engine=None). then depending on if numexpr is installed can select an appropriate engine.

I suspect this is skipped in testing now because since numexpr is NOT installed the tests just skip over this. so might need some tweaking to test it on Travis.

lmk.

@jreback jreback added this to the Next Major Release milestone Jan 9, 2016
@jreback jreback modified the milestones: 0.18.1, Next Major Release Mar 13, 2016
@gliptak
Copy link
Contributor

gliptak commented Apr 13, 2016

I reviewed pandas/computation/tests/test_eval.py and pandas/tests/frame/test_query_eval.py

test_eval.py references:

parsers = 'python', 'pytables', 'pandas'
engines = 'numexpr', 'python'

while test_query_eval.py references:

PARSERS = 'python', 'pandas'
ENGINES = 'python', 'numexpr'
  • Would pytables also be applicable to test_query_eval.py?
  • Is there a desired hierarchy (as in use pytables, `pandasandpython`` last)?
  • Should all tests work for all combinations of these (with some really specific exceptions)?

(I didn't find good references in the documentation on these above)

@jreback
Copy link
Contributor

jreback commented Apr 13, 2016

oh I didn't realize we had another issue about this; was fixed here

just recently. no need to clarify the docs as we do the auto fallback.

@jreback jreback closed this as completed Apr 13, 2016
@jreback
Copy link
Contributor

jreback commented Apr 13, 2016

#12867 is still applicable though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

No branches or pull requests

3 participants