ERR: better error reporting for missing numexpr #5969

twiecki · 2014-01-16T16:08:44Z

From the example at http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.query.html:

from numpy.random import randn
from pandas import DataFrame
df = DataFrame(randn(10, 2), columns=list('ab'))
df.query('a > b')

gives me:

---------------------------------------------------------------------------
NameResolutionError                       Traceback (most recent call last)
<ipython-input-19-47040e53b0e7> in <module>()
      2 from pandas import DataFrame
      3 df = DataFrame(randn(10, 2), columns=list('ab'))
----> 4 df.query('a > b')

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in query(self, expr, **kwargs)
   1778                              "query expression")
   1779 
-> 1780         res = self.eval(expr, **kwargs)
   1781 
   1782         try:

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in eval(self, expr, **kwargs)
   1829         kwargs['local_dict'] = _ensure_scope(resolvers=resolvers, **kwargs)
   1830         kwargs['target'] = self
-> 1831         return _eval(expr, **kwargs)
   1832 
   1833     def _slice(self, slobj, axis=0, raise_on_error=False, typ=None):

/usr/local/lib/python2.7/dist-packages/pandas/computation/eval.pyc in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target)
    206     eng = _engines[engine]
    207     eng_inst = eng(parsed_expr)
--> 208     ret = eng_inst.evaluate()
    209 
    210     # assign if needed

/usr/local/lib/python2.7/dist-packages/pandas/computation/engines.pyc in evaluate(self)
     48 
     49         # make sure no names in resolvers and locals/globals clash
---> 50         self.pre_evaluate()
     51         res = self._evaluate()
     52         return _reconstruct_object(self.result_type, res, self.aligned_axes,

/usr/local/lib/python2.7/dist-packages/pandas/computation/engines.pyc in pre_evaluate(self)
     31 
     32     def pre_evaluate(self):
---> 33         self.expr.check_name_clashes()
     34 
     35     def evaluate(self):

/usr/local/lib/python2.7/dist-packages/pandas/computation/expr.pyc in check_name_clashes(self)
    797         lcl_keys = frozenset(env.locals.keys()) & names
    798         gbl_keys = frozenset(env.globals.keys()) & names
--> 799         _check_disjoint_resolver_names(res_keys, lcl_keys, gbl_keys)
    800 
    801     def add_resolvers_to_locals(self):

/usr/local/lib/python2.7/dist-packages/pandas/computation/expr.pyc in _check_disjoint_resolver_names(resolver_keys, local_keys, global_keys)
     39     if res_locals:
     40         msg = "resolvers and locals overlap on names {0}".format(res_locals)
---> 41         raise NameResolutionError(msg)
     42 
     43     res_globals = list(com.intersection(resolver_keys, global_keys))

NameResolutionError: resolvers and locals overlap on names ['a']

This is with 0.13.

The text was updated successfully, but these errors were encountered:

jreback · 2014-01-16T16:10:50Z

In [2]: df = DataFrame(randn(10, 2), columns=list('ab'))

In [3]: df.query('a > b')
Out[3]: 
          a         b
0 -1.300378 -2.176283
1  0.281861 -1.441128
5 -0.425845 -0.597904
6  2.110389  0.941799
7 -0.693335 -1.215670

[5 rows x 2 columns]

You must have a or b defined somewhere in scope (that is the meaning of the error)

twiecki · 2014-01-16T16:14:16Z

What do you mean by "in scope"? My understanding is that it will assume 'a' and 'b' are columns of the df. Do I have to have 'a' and 'b' defined as python variables somwhere?

twiecki · 2014-01-16T16:27:12Z

This is quite weird, I now purged pandas and reinstalled and now I get:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-3d30a42056f6> in <module>()
      1 df = pd.DataFrame(numpy.random.randn(10, 2), columns=list('ab'))
----> 2 df.query('a > b')

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in query(self, expr, **kwargs)
   1778                              "query expression")
   1779 
-> 1780         res = self.eval(expr, **kwargs)
   1781 
   1782         try:

/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in eval(self, expr, **kwargs)
   1829         kwargs['local_dict'] = _ensure_scope(resolvers=resolvers, **kwargs)
   1830         kwargs['target'] = self
-> 1831         return _eval(expr, **kwargs)
   1832 
   1833     def _slice(self, slobj, axis=0, raise_on_error=False, typ=None):

/usr/local/lib/python2.7/dist-packages/pandas/computation/eval.pyc in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target)
    206     eng = _engines[engine]
    207     eng_inst = eng(parsed_expr)
--> 208     ret = eng_inst.evaluate()
    209 
    210     # assign if needed

/usr/local/lib/python2.7/dist-packages/pandas/computation/engines.pyc in evaluate(self)
     49         # make sure no names in resolvers and locals/globals clash
     50         self.pre_evaluate()
---> 51         res = self._evaluate()
     52         return _reconstruct_object(self.result_type, res, self.aligned_axes,
     53                                    self.expr.terms.return_type)

/usr/local/lib/python2.7/dist-packages/pandas/computation/engines.pyc in _evaluate(self)
     95 
     96         try:
---> 97             return ne.evaluate(s, local_dict=self.expr.env.locals,
     98                                global_dict=self.expr.env.globals,
     99                                truediv=self.expr.truediv)

AttributeError: 'module' object has no attribute 'evaluate'

/usr/local/lib/python2.7/dist-packages/pandas/computation/ops.py:62: DeprecationWarning: object.__new__() takes no parameters
  return supr_new(klass, name, env, side=side, encoding=encoding)
/usr/local/lib/python2.7/dist-packages/pandas/computation/ops.py:62: DeprecationWarning: object.__new__() takes no parameters
  return supr_new(klass, name, env, side=side, encoding=encoding)

jreback · 2014-01-16T16:40:01Z

what I mean is if a or b is defined as a local variable before this call it will complain

In [5]: a = 5

In [6]: In [4]: df.query('a > b')
NameResolutionError: resolvers and locals overlap on names ['a']

you can pass local_dict=dict() to avoid it looking at the scope at all

twiecki · 2014-01-16T16:41:26Z

Oh, I see. yes, that makes sense.

Although now I am getting the second error (AttributeError: 'module' object has no attribute 'evaluate') in a clean interpreter.

cpcloud · 2014-01-16T16:42:52Z

I wonder if this error message could be a bit more transparent. When I wrote this it made perfect sense, but now that 0.13 is out and query/eval will get some more face time I'm thinking that exposing the internal details of the scope mechanism is a bit rude. @jreback Thoughts?

cpcloud · 2014-01-16T16:44:34Z

Maybe something along the lines of,

Local variables and column names <list of clashes> overlap

jreback · 2014-01-16T16:46:20Z

maybe also have easy way to turn this off, e.g. local_dict=False or use_locals=False?

jreback · 2014-01-16T16:46:38Z

maybe should be off by default?

cpcloud · 2014-01-16T16:47:02Z

Resolvers is more general (since you don't have to use a DataFrame as the extra context for eval/query) but I think it's too much detail.

@twiecki As a side note, if you want to use the local variable a in your expression just throw an @ in front of it

a = 5
df.query('@a > b')

cpcloud · 2014-01-16T16:48:34Z

Hm like allow_locals? If False ignore locals always? Similar to R I believe

twiecki · 2014-01-16T16:48:42Z

Any thoughts on my current problem #5969 (comment)?

jreback · 2014-01-16T16:49:42Z

@twiecki make clean first. then build again..(just to be sure)

twiecki · 2014-01-16T16:52:29Z

I installed with pip install so should be clean no matter what, no?

cpcloud · 2014-01-16T16:53:37Z

@twiecki Can you show the output of

import pandas
pandas.util.print_versions()

I think that's where it is... @jreback pls correct me if I'm wrong

twiecki · 2014-01-16T17:45:55Z

Not there

In [1]: import pandas

In [2]: pandas.util.print_versions()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-276c792280d1> in <module>()
----> 1 pandas.util.print_versions()

AttributeError: 'module' object has no attribute 'print_versions'

jreback · 2014-01-16T17:59:08Z

in your pandas dir just do

python ci/print_versions.py

ghost · 2014-01-16T18:36:47Z

In [2]: from pandas.util.print_versions import show_versions
In [3]: show_versions()

and from now on #5976

cpcloud · 2014-01-16T18:51:18Z

@y-p awesome thanks

cpcloud · 2014-01-16T19:55:58Z

@twiecki Fixed? Or can you get those versions up?

twiecki · 2014-01-17T15:20:07Z

@cpcloud not fixed, here's the output:

INSTALLED VERSIONS
------------------
Python: 2.7.3.final.0
OS: Linux
Release: 3.2.0-29-generic
Processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: C

pandas: 0.13.0
Cython: 0.19.1
Numpy: 1.8.0
Scipy: 0.14.0.dev-a3e9c7f
statsmodels: 0.6.0.dev-fe6e688
    patsy: 0.2.1
scikits.timeseries: Not installed
dateutil: 2.2
pytz: 2013.9
bottleneck: Not installed
PyTables: Not Installed
    numexpr: Not Installed
matplotlib: 1.3.1
openpyxl: 1.7.0
xlrd: 0.9.2
xlwt: Not installed
xlsxwriter: Not installed
sqlalchemy: Not installed
lxml: 2.3.2
bs4: Not installed
html5lib: Not installed
bigquery: Not installed
apiclient: Not installed

jreback · 2014-01-17T15:25:34Z

@twiecki you must be loading something else, try in a vanilla python session

Python 2.7.3 (default, Jun 21 2012, 07:54:31) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from numpy.random import randn
>>> from pandas import DataFrame
>>> df = DataFrame(randn(10, 2), columns=list('ab'))
>>> df.query('a > b')
          a         b
0  0.587009 -0.598719
1  0.130313 -0.340606
2  1.358471  1.194978
3  0.418877 -0.018008
4  0.089849 -0.855166
5  0.540400  0.303763

[6 rows x 2 columns]
>>>

jreback · 2014-01-17T15:55:18Z

ahh...I see that you don't have numexpr installed! this is required for this!

@cpcloud I guess that's an edge case / better error message needed

cpcloud · 2014-01-17T15:55:47Z

But then why was it giving errors about ne's attribute?

cpcloud · 2014-01-17T15:56:09Z

@jreback In any case, yes a better error message

jreback · 2014-01-17T15:56:46Z

@cpcloud could it be falling back to the PythonEngine? (and taking an odd path)

twiecki · 2014-01-17T15:59:43Z

Yes, installing numexpr fixed this issue. Thanks!

jreback · 2014-01-17T16:01:27Z

@cpcloud ok...changed the name of this example....hmm...the 2.6 build prob skipped this test (but should check for the error I guess)

cpcloud · 2014-01-17T17:02:20Z

@jreback Should I open another issue for the locals option?

jreback · 2014-01-17T17:20:51Z

sure....api? locals=True ?

ghost · 2014-01-17T17:33:21Z

Require explicit references for locals via @ and minimize knobs.
'a > b' where a is a local variable and b is a df column is too much abstraction
with very little gained. It creates opportunity for ambiguity where there needen't be any.
Adding options wouldn't make it better IMO.

My 2c.

jreback · 2014-01-17T17:41:21Z

@cpcloud yeh...i am thinking that its too magical now...(we have the reverse of what @y-p is suggesting), using '@' for conflicts.

let's change to make '@' represent an explicit external variable, otherwise assume internal name references
Their no more issue of name resolution at all (because unless the '@' their is no issue and that is explict)

so this will work just fine

b = 1
df.query('a > b')

this will uses the b=1

b = 1
df.query('a>@b')

And this should work as well

b = 1
df.query('a>b>@b`)

This will break prior behavior, but this IS experimental!

if the @ variable is a Name Error then its an error of course

cpcloud · 2014-01-17T18:37:26Z

@jreback @y-p Agreed. That was actually a bit weird to write in the first place, and I think I already know how to fix this...PR coming this weekend.

ghost · 2014-01-23T16:14:26Z

We got way OT here. #5855 should take care of the numexpr warning.

cpcloud · 2014-01-23T16:14:49Z

Indeed was just about comment ...

cpcloud · 2014-01-23T17:55:21Z

@twiecki When you encountered this error did you happen to have the numexpr source in the same folder you were running the interpreter? Just trying to track down why you'd be able to get past the second line of pd.eval. thx

twiecki · 2014-01-23T17:59:19Z

@cpcloud there was no numpexpr on my system when I encountered the error. Installing it fixed the problem.

cpcloud · 2014-01-23T18:20:42Z

Right. But it shouldn't have gotten past the second line of pd.eval if that's the case. Did you have had a variable in your environment called ne that was a module? Trying to think if there's anything else that would've caused it.

twiecki · 2014-01-23T20:21:30Z

It's possible that I perhaps had a very outdated numexpr but I'm highly uncertain.

jreback · 2014-01-23T20:24:36Z

@twiecki maybe < than 2.0?

twiecki · 2014-01-23T20:35:26Z

If it was an older version it could have been very old.

jreback · 2014-01-23T20:59:36Z

@cpcloud so guess this is really #5855 then

jreback · 2014-01-26T23:16:10Z

closed by #6109

ghost assigned cpcloud Jan 17, 2014

ghost closed this as completed Jan 23, 2014

ghost mentioned this issue Jan 23, 2014

COMPAT: Ignore numexpr <= v2 (at least for now) #5855

Closed

cpcloud reopened this Jan 23, 2014

jreback closed this as completed Jan 26, 2014

wesm unassigned cpcloud Oct 12, 2016

ERR: better error reporting for missing numexpr #5969

ERR: better error reporting for missing numexpr #5969

Comments

twiecki commented Jan 16, 2014

jreback commented Jan 16, 2014

twiecki commented Jan 16, 2014

twiecki commented Jan 16, 2014

jreback commented Jan 16, 2014

twiecki commented Jan 16, 2014

cpcloud commented Jan 16, 2014

cpcloud commented Jan 16, 2014

jreback commented Jan 16, 2014

jreback commented Jan 16, 2014

cpcloud commented Jan 16, 2014

cpcloud commented Jan 16, 2014

twiecki commented Jan 16, 2014

jreback commented Jan 16, 2014

twiecki commented Jan 16, 2014

cpcloud commented Jan 16, 2014

twiecki commented Jan 16, 2014

jreback commented Jan 16, 2014

ghost commented Jan 16, 2014

cpcloud commented Jan 16, 2014

cpcloud commented Jan 16, 2014

twiecki commented Jan 17, 2014

jreback commented Jan 17, 2014

jreback commented Jan 17, 2014

cpcloud commented Jan 17, 2014

cpcloud commented Jan 17, 2014

jreback commented Jan 17, 2014

twiecki commented Jan 17, 2014

jreback commented Jan 17, 2014

cpcloud commented Jan 17, 2014

jreback commented Jan 17, 2014

ghost commented Jan 17, 2014

jreback commented Jan 17, 2014

cpcloud commented Jan 17, 2014

ghost commented Jan 23, 2014

cpcloud commented Jan 23, 2014

cpcloud commented Jan 23, 2014

twiecki commented Jan 23, 2014

cpcloud commented Jan 23, 2014

twiecki commented Jan 23, 2014

jreback commented Jan 23, 2014

twiecki commented Jan 23, 2014

jreback commented Jan 23, 2014

jreback commented Jan 26, 2014