Skip to content

ERR: better error reporting for missing numexpr #5969

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
twiecki opened this issue Jan 16, 2014 · 43 comments
Closed

ERR: better error reporting for missing numexpr #5969

twiecki opened this issue Jan 16, 2014 · 43 comments
Labels
API Design Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@twiecki
Copy link
Contributor

twiecki commented Jan 16, 2014

From the example at http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.query.html:

from numpy.random import randn
from pandas import DataFrame
df = DataFrame(randn(10, 2), columns=list('ab'))
df.query('a > b')

gives me:

---------------------------------------------------------------------------
NameResolutionError                       Traceback (most recent call last)
<ipython-input-19-47040e53b0e7> in <module>()
      2 from pandas import DataFrame
      3 df = DataFrame(randn(10, 2), columns=list('ab'))
----> 4 df.query('a > b')

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in query(self, expr, **kwargs)
   1778                              "query expression")
   1779 
-> 1780         res = self.eval(expr, **kwargs)
   1781 
   1782         try:

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in eval(self, expr, **kwargs)
   1829         kwargs['local_dict'] = _ensure_scope(resolvers=resolvers, **kwargs)
   1830         kwargs['target'] = self
-> 1831         return _eval(expr, **kwargs)
   1832 
   1833     def _slice(self, slobj, axis=0, raise_on_error=False, typ=None):

/usr/local/lib/python2.7/dist-packages/pandas/computation/eval.pyc in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target)
    206     eng = _engines[engine]
    207     eng_inst = eng(parsed_expr)
--> 208     ret = eng_inst.evaluate()
    209 
    210     # assign if needed

/usr/local/lib/python2.7/dist-packages/pandas/computation/engines.pyc in evaluate(self)
     48 
     49         # make sure no names in resolvers and locals/globals clash
---> 50         self.pre_evaluate()
     51         res = self._evaluate()
     52         return _reconstruct_object(self.result_type, res, self.aligned_axes,

/usr/local/lib/python2.7/dist-packages/pandas/computation/engines.pyc in pre_evaluate(self)
     31 
     32     def pre_evaluate(self):
---> 33         self.expr.check_name_clashes()
     34 
     35     def evaluate(self):

/usr/local/lib/python2.7/dist-packages/pandas/computation/expr.pyc in check_name_clashes(self)
    797         lcl_keys = frozenset(env.locals.keys()) & names
    798         gbl_keys = frozenset(env.globals.keys()) & names
--> 799         _check_disjoint_resolver_names(res_keys, lcl_keys, gbl_keys)
    800 
    801     def add_resolvers_to_locals(self):

/usr/local/lib/python2.7/dist-packages/pandas/computation/expr.pyc in _check_disjoint_resolver_names(resolver_keys, local_keys, global_keys)
     39     if res_locals:
     40         msg = "resolvers and locals overlap on names {0}".format(res_locals)
---> 41         raise NameResolutionError(msg)
     42 
     43     res_globals = list(com.intersection(resolver_keys, global_keys))

NameResolutionError: resolvers and locals overlap on names ['a']

This is with 0.13.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

In [2]: df = DataFrame(randn(10, 2), columns=list('ab'))

In [3]: df.query('a > b')
Out[3]: 
          a         b
0 -1.300378 -2.176283
1  0.281861 -1.441128
5 -0.425845 -0.597904
6  2.110389  0.941799
7 -0.693335 -1.215670

[5 rows x 2 columns]

You must have a or b defined somewhere in scope (that is the meaning of the error)

@twiecki
Copy link
Contributor Author

twiecki commented Jan 16, 2014

What do you mean by "in scope"? My understanding is that it will assume 'a' and 'b' are columns of the df. Do I have to have 'a' and 'b' defined as python variables somwhere?

@twiecki
Copy link
Contributor Author

twiecki commented Jan 16, 2014

This is quite weird, I now purged pandas and reinstalled and now I get:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-3d30a42056f6> in <module>()
      1 df = pd.DataFrame(numpy.random.randn(10, 2), columns=list('ab'))
----> 2 df.query('a > b')

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in query(self, expr, **kwargs)
   1778                              "query expression")
   1779 
-> 1780         res = self.eval(expr, **kwargs)
   1781 
   1782         try:

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in eval(self, expr, **kwargs)
   1829         kwargs['local_dict'] = _ensure_scope(resolvers=resolvers, **kwargs)
   1830         kwargs['target'] = self
-> 1831         return _eval(expr, **kwargs)
   1832 
   1833     def _slice(self, slobj, axis=0, raise_on_error=False, typ=None):

/usr/local/lib/python2.7/dist-packages/pandas/computation/eval.pyc in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target)
    206     eng = _engines[engine]
    207     eng_inst = eng(parsed_expr)
--> 208     ret = eng_inst.evaluate()
    209 
    210     # assign if needed

/usr/local/lib/python2.7/dist-packages/pandas/computation/engines.pyc in evaluate(self)
     49         # make sure no names in resolvers and locals/globals clash
     50         self.pre_evaluate()
---> 51         res = self._evaluate()
     52         return _reconstruct_object(self.result_type, res, self.aligned_axes,
     53                                    self.expr.terms.return_type)

/usr/local/lib/python2.7/dist-packages/pandas/computation/engines.pyc in _evaluate(self)
     95 
     96         try:
---> 97             return ne.evaluate(s, local_dict=self.expr.env.locals,
     98                                global_dict=self.expr.env.globals,
     99                                truediv=self.expr.truediv)

AttributeError: 'module' object has no attribute 'evaluate'

/usr/local/lib/python2.7/dist-packages/pandas/computation/ops.py:62: DeprecationWarning: object.__new__() takes no parameters
  return supr_new(klass, name, env, side=side, encoding=encoding)
/usr/local/lib/python2.7/dist-packages/pandas/computation/ops.py:62: DeprecationWarning: object.__new__() takes no parameters
  return supr_new(klass, name, env, side=side, encoding=encoding)

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

what I mean is if a or b is defined as a local variable before this call it will complain

In [5]: a = 5

In [6]: In [4]: df.query('a > b')
NameResolutionError: resolvers and locals overlap on names ['a']

you can pass local_dict=dict() to avoid it looking at the scope at all

@twiecki
Copy link
Contributor Author

twiecki commented Jan 16, 2014

Oh, I see. yes, that makes sense.

Although now I am getting the second error (AttributeError: 'module' object has no attribute 'evaluate') in a clean interpreter.

@cpcloud
Copy link
Member

cpcloud commented Jan 16, 2014

I wonder if this error message could be a bit more transparent. When I wrote this it made perfect sense, but now that 0.13 is out and query/eval will get some more face time I'm thinking that exposing the internal details of the scope mechanism is a bit rude. @jreback Thoughts?

@cpcloud
Copy link
Member

cpcloud commented Jan 16, 2014

Maybe something along the lines of,

Local variables and column names <list of clashes> overlap

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

maybe also have easy way to turn this off, e.g. local_dict=False or use_locals=False?

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

maybe should be off by default?

@cpcloud
Copy link
Member

cpcloud commented Jan 16, 2014

Resolvers is more general (since you don't have to use a DataFrame as the extra context for eval/query) but I think it's too much detail.

@twiecki As a side note, if you want to use the local variable a in your expression just throw an @ in front of it

a = 5
df.query('@a > b')

@cpcloud
Copy link
Member

cpcloud commented Jan 16, 2014

Hm like allow_locals? If False ignore locals always? Similar to R I believe

@twiecki
Copy link
Contributor Author

twiecki commented Jan 16, 2014

Any thoughts on my current problem #5969 (comment)?

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

@twiecki make clean first. then build again..(just to be sure)

@twiecki
Copy link
Contributor Author

twiecki commented Jan 16, 2014

I installed with pip install so should be clean no matter what, no?

@cpcloud
Copy link
Member

cpcloud commented Jan 16, 2014

@twiecki Can you show the output of

import pandas
pandas.util.print_versions()

I think that's where it is... @jreback pls correct me if I'm wrong

@twiecki
Copy link
Contributor Author

twiecki commented Jan 16, 2014

Not there

In [1]: import pandas

In [2]: pandas.util.print_versions()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-276c792280d1> in <module>()
----> 1 pandas.util.print_versions()

AttributeError: 'module' object has no attribute 'print_versions'

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

in your pandas dir just do

python ci/print_versions.py

@ghost
Copy link

ghost commented Jan 16, 2014

In [2]: from pandas.util.print_versions import show_versions
In [3]: show_versions()

and from now on #5976

@cpcloud
Copy link
Member

cpcloud commented Jan 16, 2014

@y-p awesome thanks

@cpcloud
Copy link
Member

cpcloud commented Jan 16, 2014

@twiecki Fixed? Or can you get those versions up?

@twiecki
Copy link
Contributor Author

twiecki commented Jan 17, 2014

@cpcloud not fixed, here's the output:

INSTALLED VERSIONS
------------------
Python: 2.7.3.final.0
OS: Linux
Release: 3.2.0-29-generic
Processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: C

pandas: 0.13.0
Cython: 0.19.1
Numpy: 1.8.0
Scipy: 0.14.0.dev-a3e9c7f
statsmodels: 0.6.0.dev-fe6e688
    patsy: 0.2.1
scikits.timeseries: Not installed
dateutil: 2.2
pytz: 2013.9
bottleneck: Not installed
PyTables: Not Installed
    numexpr: Not Installed
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: Not installed
xlsxwriter: Not installed
sqlalchemy: Not installed
lxml: 2.3.2
bs4: Not installed
html5lib: Not installed
bigquery: Not installed
apiclient: Not installed

@jreback
Copy link
Contributor

jreback commented Jan 17, 2014

@twiecki you must be loading something else, try in a vanilla python session

Python 2.7.3 (default, Jun 21 2012, 07:54:31) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.query('a > b')
          a         b
0  0.587009 -0.598719
1  0.130313 -0.340606
2  1.358471  1.194978
3  0.418877 -0.018008
4  0.089849 -0.855166
5  0.540400  0.303763

[6 rows x 2 columns]
>>> 

@jreback
Copy link
Contributor

jreback commented Jan 17, 2014

ahh...I see that you don't have numexpr installed! this is required for this!

@cpcloud I guess that's an edge case / better error message needed

@cpcloud
Copy link
Member

cpcloud commented Jan 17, 2014

But then why was it giving errors about ne's attribute?

@cpcloud
Copy link
Member

cpcloud commented Jan 17, 2014

@jreback In any case, yes a better error message

@jreback
Copy link
Contributor

jreback commented Jan 17, 2014

@cpcloud could it be falling back to the PythonEngine? (and taking an odd path)

@twiecki
Copy link
Contributor Author

twiecki commented Jan 17, 2014

Yes, installing numexpr fixed this issue. Thanks!

@jreback
Copy link
Contributor

jreback commented Jan 17, 2014

@cpcloud ok...changed the name of this example....hmm...the 2.6 build prob skipped this test (but should check for the error I guess)

@ghost ghost assigned cpcloud Jan 17, 2014
@cpcloud
Copy link
Member

cpcloud commented Jan 17, 2014

@jreback Should I open another issue for the locals option?

@jreback
Copy link
Contributor

jreback commented Jan 17, 2014

sure....api? locals=True ?

@ghost
Copy link

ghost commented Jan 17, 2014

Require explicit references for locals via @ and minimize knobs.
'a > b' where a is a local variable and b is a df column is too much abstraction
with very little gained. It creates opportunity for ambiguity where there needen't be any.
Adding options wouldn't make it better IMO.

My 2c.

@jreback
Copy link
Contributor

jreback commented Jan 17, 2014

@cpcloud yeh...i am thinking that its too magical now...(we have the reverse of what @y-p is suggesting), using '@' for conflicts.

let's change to make '@' represent an explicit external variable, otherwise assume internal name references
Their no more issue of name resolution at all (because unless the '@' their is no issue and that is explict)

so this will work just fine

b = 1
df.query('a > b')

this will uses the b=1

b = 1
df.query('a>@b')

And this should work as well

b = 1
df.query('a>b>@b`)

This will break prior behavior, but this IS experimental!

if the @ variable is a Name Error then its an error of course

@cpcloud
Copy link
Member

cpcloud commented Jan 17, 2014

@jreback @y-p Agreed. That was actually a bit weird to write in the first place, and I think I already know how to fix this...PR coming this weekend.

@ghost
Copy link

ghost commented Jan 23, 2014

We got way OT here. #5855 should take care of the numexpr warning.

@ghost ghost closed this as completed Jan 23, 2014
@cpcloud
Copy link
Member

cpcloud commented Jan 23, 2014

Indeed was just about comment ...

@cpcloud
Copy link
Member

cpcloud commented Jan 23, 2014

@twiecki When you encountered this error did you happen to have the numexpr source in the same folder you were running the interpreter? Just trying to track down why you'd be able to get past the second line of pd.eval. thx

@twiecki
Copy link
Contributor Author

twiecki commented Jan 23, 2014

@cpcloud there was no numpexpr on my system when I encountered the error. Installing it fixed the problem.

@cpcloud
Copy link
Member

cpcloud commented Jan 23, 2014

Right. But it shouldn't have gotten past the second line of pd.eval if that's the case. Did you have had a variable in your environment called ne that was a module? Trying to think if there's anything else that would've caused it.

@twiecki
Copy link
Contributor Author

twiecki commented Jan 23, 2014

It's possible that I perhaps had a very outdated numexpr but I'm highly uncertain.

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

@twiecki maybe < than 2.0?

@twiecki
Copy link
Contributor Author

twiecki commented Jan 23, 2014

If it was an older version it could have been very old.

@jreback
Copy link
Contributor

jreback commented Jan 23, 2014

@cpcloud so guess this is really #5855 then

@jreback
Copy link
Contributor

jreback commented Jan 26, 2014

closed by #6109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

No branches or pull requests

3 participants