Skip to content

Commit 91b2b00

Browse files
committed
Merge pull request #6366 from cpcloud/eval-fix-scope
ENH: fix eval scoping issues
2 parents 092078e + 448ebec commit 91b2b00

18 files changed

+939
-564
lines changed

doc/source/enhancingperf.rst

+96-62
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,7 @@ Expression Evaluation via :func:`~pandas.eval` (Experimental)
300300

301301
.. versionadded:: 0.13
302302

303-
The top-level function :func:`~pandas.eval` implements expression evaluation of
303+
The top-level function :func:`pandas.eval` implements expression evaluation of
304304
:class:`~pandas.Series` and :class:`~pandas.DataFrame` objects.
305305

306306
.. note::
@@ -336,11 +336,11 @@ engine in addition to some extensions available only in pandas.
336336
Supported Syntax
337337
~~~~~~~~~~~~~~~~
338338

339-
These operations are supported by :func:`~pandas.eval`:
339+
These operations are supported by :func:`pandas.eval`:
340340

341341
- Arithmetic operations except for the left shift (``<<``) and right shift
342342
(``>>``) operators, e.g., ``df + 2 * pi / s ** 4 % 42 - the_golden_ratio``
343-
- Comparison operations, e.g., ``2 < df < df2``
343+
- Comparison operations, including chained comparisons, e.g., ``2 < df < df2``
344344
- Boolean operations, e.g., ``df < df2 and df3 < df4 or not df_bool``
345345
- ``list`` and ``tuple`` literals, e.g., ``[1, 2]`` or ``(1, 2)``
346346
- Attribute access, e.g., ``df.a``
@@ -373,9 +373,9 @@ This Python syntax is **not** allowed:
373373
:func:`~pandas.eval` Examples
374374
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
375375

376-
:func:`~pandas.eval` works wonders for expressions containing large arrays
376+
:func:`pandas.eval` works well with expressions containing large arrays
377377

378-
First let's create 4 decent-sized arrays to play with:
378+
First let's create a few decent-sized arrays to play with:
379379

380380
.. ipython:: python
381381
@@ -441,8 +441,10 @@ Now let's do the same thing but with comparisons:
441441
The ``DataFrame.eval`` method (Experimental)
442442
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
443443

444-
In addition to the top level :func:`~pandas.eval` function you can also
445-
evaluate an expression in the "context" of a ``DataFrame``.
444+
.. versionadded:: 0.13
445+
446+
In addition to the top level :func:`pandas.eval` function you can also
447+
evaluate an expression in the "context" of a :class:`~pandas.DataFrame`.
446448

447449
.. ipython:: python
448450
:suppress:
@@ -462,10 +464,10 @@ evaluate an expression in the "context" of a ``DataFrame``.
462464
df = DataFrame(randn(5, 2), columns=['a', 'b'])
463465
df.eval('a + b')
464466
465-
Any expression that is a valid :func:`~pandas.eval` expression is also a valid
466-
``DataFrame.eval`` expression, with the added benefit that *you don't have to
467-
prefix the name of the* ``DataFrame`` *to the column(s) you're interested in
468-
evaluating*.
467+
Any expression that is a valid :func:`pandas.eval` expression is also a valid
468+
:meth:`DataFrame.eval` expression, with the added benefit that you don't have to
469+
prefix the name of the :class:`~pandas.DataFrame` to the column(s) you're
470+
interested in evaluating.
469471

470472
In addition, you can perform assignment of columns within an expression.
471473
This allows for *formulaic evaluation*. Only a single assignment is permitted.
@@ -480,55 +482,75 @@ it must be a valid Python identifier.
480482
df.eval('a = 1')
481483
df
482484
485+
The equivalent in standard Python would be
486+
487+
.. ipython:: python
488+
489+
df = DataFrame(dict(a=range(5), b=range(5, 10)))
490+
df['c'] = df.a + df.b
491+
df['d'] = df.a + df.b + df.c
492+
df['a'] = 1
493+
df
494+
483495
Local Variables
484496
~~~~~~~~~~~~~~~
485497

486-
You can refer to local variables the same way you would in vanilla Python
498+
In pandas version 0.14 the local variable API has changed. In pandas 0.13.x,
499+
you could refer to local variables the same way you would in standard Python.
500+
For example,
487501

488-
.. ipython:: python
502+
.. code-block:: python
489503
490504
df = DataFrame(randn(5, 2), columns=['a', 'b'])
491505
newcol = randn(len(df))
492506
df.eval('b + newcol')
493507
494-
.. note::
508+
UndefinedVariableError: name 'newcol' is not defined
495509
496-
The one exception is when you have a local (or global) with the same name as
497-
a column in the ``DataFrame``
510+
As you can see from the exception generated, this syntax is no longer allowed.
511+
You must *explicitly reference* any local variable that you want to use in an
512+
expression by placing the ``@`` character in front of the name. For example,
498513

499-
.. code-block:: python
514+
.. ipython:: python
500515
501-
df = DataFrame(randn(5, 2), columns=['a', 'b'])
502-
a = randn(len(df))
503-
df.eval('a + b')
504-
NameResolutionError: resolvers and locals overlap on names ['a']
516+
df = DataFrame(randn(5, 2), columns=list('ab'))
517+
newcol = randn(len(df))
518+
df.eval('b + @newcol')
519+
df.query('b < @newcol')
505520
521+
If you don't prefix the local variable with ``@``, pandas will raise an
522+
exception telling you the variable is undefined.
506523

507-
To deal with these conflicts, a special syntax exists for referring
508-
variables with the same name as a column
524+
When using :meth:`DataFrame.eval` and :meth:`DataFrame.query`, this allows you
525+
to have a local variable and a :class:`~pandas.DataFrame` column with the same
526+
name in an expression.
509527

510-
.. ipython:: python
511-
:suppress:
512528

513-
a = randn(len(df))
529+
.. ipython:: python
514530
515-
.. ipython:: python
531+
a = randn()
532+
df.query('@a < a')
533+
df.loc[a < df.a] # same as the previous expression
516534
517-
df.eval('@a + b')
535+
With :func:`pandas.eval` you cannot use the ``@`` prefix *at all*, because it
536+
isn't defined in that context. ``pandas`` will let you know this if you try to
537+
use ``@`` in a top-level call to :func:`pandas.eval`. For example,
518538

519-
The same is true for :meth:`~pandas.DataFrame.query`
539+
.. ipython:: python
540+
:okexcept:
520541
521-
.. ipython:: python
542+
a, b = 1, 2
543+
pd.eval('@a + b')
522544
523-
df.query('@a < b')
545+
In this case, you should simply refer to the variables like you would in
546+
standard Python.
524547

525-
.. ipython:: python
526-
:suppress:
548+
.. ipython:: python
527549
528-
del a
550+
pd.eval('a + b')
529551
530552
531-
:func:`~pandas.eval` Parsers
553+
:func:`pandas.eval` Parsers
532554
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
533555

534556
There are two different parsers and and two different engines you can use as
@@ -568,7 +590,7 @@ The ``and`` and ``or`` operators here have the same precedence that they would
568590
in vanilla Python.
569591

570592

571-
:func:`~pandas.eval` Backends
593+
:func:`pandas.eval` Backends
572594
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
573595

574596
There's also the option to make :func:`~pandas.eval` operate identical to plain
@@ -577,12 +599,12 @@ ol' Python.
577599
.. note::
578600

579601
Using the ``'python'`` engine is generally *not* useful, except for testing
580-
other :func:`~pandas.eval` engines against it. You will acheive **no**
581-
performance benefits using :func:`~pandas.eval` with ``engine='python'``.
602+
other evaluation engines against it. You will acheive **no** performance
603+
benefits using :func:`~pandas.eval` with ``engine='python'`` and in fact may
604+
incur a performance hit.
582605

583-
You can see this by using :func:`~pandas.eval` with the ``'python'`` engine is
584-
actually a bit slower (not by much) than evaluating the same expression in
585-
Python:
606+
You can see this by using :func:`pandas.eval` with the ``'python'`` engine. It
607+
is a bit slower (not by much) than evaluating the same expression in Python
586608

587609
.. ipython:: python
588610
@@ -593,15 +615,15 @@ Python:
593615
%timeit pd.eval('df1 + df2 + df3 + df4', engine='python')
594616
595617
596-
:func:`~pandas.eval` Performance
618+
:func:`pandas.eval` Performance
597619
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
598620

599621
:func:`~pandas.eval` is intended to speed up certain kinds of operations. In
600622
particular, those operations involving complex expressions with large
601-
``DataFrame``/``Series`` objects should see a significant performance benefit.
602-
Here is a plot showing the running time of :func:`~pandas.eval` as function of
603-
the size of the frame involved in the computation. The two lines are two
604-
different engines.
623+
:class:`~pandas.DataFrame`/:class:`~pandas.Series` objects should see a
624+
significant performance benefit. Here is a plot showing the running time of
625+
:func:`pandas.eval` as function of the size of the frame involved in the
626+
computation. The two lines are two different engines.
605627

606628

607629
.. image:: _static/eval-perf.png
@@ -618,19 +640,31 @@ different engines.
618640
This plot was created using a ``DataFrame`` with 3 columns each containing
619641
floating point values generated using ``numpy.random.randn()``.
620642

621-
Technical Minutia
622-
~~~~~~~~~~~~~~~~~
623-
- Expressions that would result in an object dtype (including simple
624-
variable evaluation) have to be evaluated in Python space. The main reason
625-
for this behavior is to maintain backwards compatbility with versions of
626-
numpy < 1.7. In those versions of ``numpy`` a call to ``ndarray.astype(str)``
627-
will truncate any strings that are more than 60 characters in length. Second,
628-
we can't pass ``object`` arrays to ``numexpr`` thus string comparisons must
629-
be evaluated in Python space.
630-
- The upshot is that this *only* applies to object-dtype'd expressions. So,
631-
if you have an expression--for example--that's a string comparison
632-
``and``-ed together with another boolean expression that's from a numeric
633-
comparison, the numeric comparison will be evaluated by ``numexpr``. In fact,
634-
in general, :func:`~pandas.query`/:func:`~pandas.eval` will "pick out" the
635-
subexpressions that are ``eval``-able by ``numexpr`` and those that must be
636-
evaluated in Python space transparently to the user.
643+
Technical Minutia Regarding Expression Evaluation
644+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
645+
646+
Expressions that would result in an object dtype or involve datetime operations
647+
(because of ``NaT``) must be evaluated in Python space. The main reason for
648+
this behavior is to maintain backwards compatbility with versions of numpy <
649+
1.7. In those versions of ``numpy`` a call to ``ndarray.astype(str)`` will
650+
truncate any strings that are more than 60 characters in length. Second, we
651+
can't pass ``object`` arrays to ``numexpr`` thus string comparisons must be
652+
evaluated in Python space.
653+
654+
The upshot is that this *only* applies to object-dtype'd expressions. So, if
655+
you have an expression--for example
656+
657+
.. ipython:: python
658+
659+
df = DataFrame({'strings': np.repeat(list('cba'), 3),
660+
'nums': np.repeat(range(3), 3)})
661+
df
662+
df.query('strings == "a" and nums == 1')
663+
664+
the numeric part of the comparison (``nums == 1``) will be evaluated by
665+
``numexpr``.
666+
667+
In general, :meth:`DataFrame.query`/:func:`pandas.eval` will
668+
evaluate the subexpressions that *can* be evaluated by ``numexpr`` and those
669+
that must be evaluated in Python space transparently to the user. This is done
670+
by inferring the result type of an expression from its arguments and operators.

doc/source/release.rst

+19
Original file line numberDiff line numberDiff line change
@@ -83,9 +83,26 @@ API Changes
8383
- ``pd.infer_freq()``
8484
- ``pd.infer_freq()`` will now raise a ``TypeError`` if given an invalid ``Series/Index`` type (:issue:`6407`)
8585

86+
- Local variable usage has changed in
87+
:func:`pandas.eval`/:meth:`DataFrame.eval`/:meth:`DataFrame.query`
88+
(:issue:`5987`). For the :class:`~pandas.DataFrame` methods, two things have
89+
changed
90+
91+
- Column names are now given precedence over locals
92+
- Local variables must be referred to explicitly. This means that even if
93+
you have a local variable that is *not* a column you must still refer to
94+
it with the ``'@'`` prefix.
95+
- You can have an expression like ``df.query('@a < a')`` with no complaints
96+
from ``pandas`` about ambiguity of the name ``a``.
97+
98+
- The top-level :func:`pandas.eval` function does not allow you use the
99+
``'@'`` prefix and provides you with an error message telling you so.
100+
- ``NameResolutionError`` was removed because it isn't necessary anymore.
101+
86102
Experimental Features
87103
~~~~~~~~~~~~~~~~~~~~~
88104

105+
89106
Improvements to existing features
90107
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
91108

@@ -144,6 +161,8 @@ Bug Fixes
144161
- Bug in DataFrame.dropna with duplicate indices (:issue:`6355`)
145162
- Regression in chained getitem indexing with embedded list-like from 0.12 (:issue:`6394`)
146163
- ``Float64Index`` with nans not comparing correctly
164+
- ``eval``/``query`` expressions with strings containing the ``@`` character
165+
will now work (:issue:`6366`).
147166

148167
pandas 0.13.1
149168
-------------

doc/source/v0.14.0.txt

+16
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,22 @@ API changes
5151
s.year
5252
s.index.year
5353

54+
- Local variable usage has changed in
55+
:func:`pandas.eval`/:meth:`DataFrame.eval`/:meth:`DataFrame.query`
56+
(:issue:`5987`). For the :class:`~pandas.DataFrame` methods, two things have
57+
changed
58+
59+
- Column names are now given precedence over locals
60+
- Local variables must be referred to explicitly. This means that even if
61+
you have a local variable that is *not* a column you must still refer to
62+
it with the ``'@'`` prefix.
63+
- You can have an expression like ``df.query('@a < a')`` with no complaints
64+
from ``pandas`` about ambiguity of the name ``a``.
65+
66+
- The top-level :func:`pandas.eval` function does not allow you use the
67+
``'@'`` prefix and provides you with an error message telling you so.
68+
- ``NameResolutionError`` was removed because it isn't necessary anymore.
69+
5470
MultiIndexing Using Slicers
5571
~~~~~~~~~~~~~~~~~~~~~~~~~~~
5672

pandas/compat/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@
5454
import pickle as cPickle
5555
import http.client as httplib
5656

57+
from pandas.compat.chainmap import DeepChainMap
58+
5759

5860
if PY3:
5961
def isidentifier(s):

pandas/compat/chainmap.py

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
try:
2+
from collections import ChainMap
3+
except ImportError:
4+
from pandas.compat.chainmap_impl import ChainMap
5+
6+
7+
class DeepChainMap(ChainMap):
8+
def __setitem__(self, key, value):
9+
for mapping in self.maps:
10+
if key in mapping:
11+
mapping[key] = value
12+
return
13+
self.maps[0][key] = value
14+
15+
def __delitem__(self, key):
16+
for mapping in self.maps:
17+
if key in mapping:
18+
del mapping[key]
19+
return
20+
raise KeyError(key)
21+
22+
# override because the m parameter is introduced in Python 3.4
23+
def new_child(self, m=None):
24+
if m is None:
25+
m = {}
26+
return self.__class__(m, *self.maps)

0 commit comments

Comments
 (0)