@@ -300,7 +300,7 @@ Expression Evaluation via :func:`~pandas.eval` (Experimental)
300
300
301
301
.. versionadded :: 0.13
302
302
303
- The top-level function :func: `~ pandas.eval ` implements expression evaluation of
303
+ The top-level function :func: `pandas.eval ` implements expression evaluation of
304
304
:class: `~pandas.Series ` and :class: `~pandas.DataFrame ` objects.
305
305
306
306
.. note ::
@@ -336,11 +336,11 @@ engine in addition to some extensions available only in pandas.
336
336
Supported Syntax
337
337
~~~~~~~~~~~~~~~~
338
338
339
- These operations are supported by :func: `~ pandas.eval `:
339
+ These operations are supported by :func: `pandas.eval `:
340
340
341
341
- Arithmetic operations except for the left shift (``<< ``) and right shift
342
342
(``>> ``) operators, e.g., ``df + 2 * pi / s ** 4 % 42 - the_golden_ratio ``
343
- - Comparison operations, e.g., ``2 < df < df2 ``
343
+ - Comparison operations, including chained comparisons, e.g., ``2 < df < df2 ``
344
344
- Boolean operations, e.g., ``df < df2 and df3 < df4 or not df_bool ``
345
345
- ``list `` and ``tuple `` literals, e.g., ``[1, 2] `` or ``(1, 2) ``
346
346
- Attribute access, e.g., ``df.a ``
@@ -373,9 +373,9 @@ This Python syntax is **not** allowed:
373
373
:func: `~pandas.eval ` Examples
374
374
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
375
375
376
- :func: `~ pandas.eval ` works wonders for expressions containing large arrays
376
+ :func: `pandas.eval ` works well with expressions containing large arrays
377
377
378
- First let's create 4 decent-sized arrays to play with:
378
+ First let's create a few decent-sized arrays to play with:
379
379
380
380
.. ipython :: python
381
381
@@ -441,8 +441,10 @@ Now let's do the same thing but with comparisons:
441
441
The ``DataFrame.eval `` method (Experimental)
442
442
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
443
443
444
- In addition to the top level :func: `~pandas.eval ` function you can also
445
- evaluate an expression in the "context" of a ``DataFrame ``.
444
+ .. versionadded :: 0.13
445
+
446
+ In addition to the top level :func: `pandas.eval ` function you can also
447
+ evaluate an expression in the "context" of a :class: `~pandas.DataFrame `.
446
448
447
449
.. ipython :: python
448
450
:suppress:
@@ -462,10 +464,10 @@ evaluate an expression in the "context" of a ``DataFrame``.
462
464
df = DataFrame(randn(5 , 2 ), columns = [' a' , ' b' ])
463
465
df.eval(' a + b' )
464
466
465
- Any expression that is a valid :func: `~ pandas.eval ` expression is also a valid
466
- `` DataFrame.eval `` expression, with the added benefit that * you don't have to
467
- prefix the name of the * `` DataFrame `` * to the column(s) you're interested in
468
- evaluating * .
467
+ Any expression that is a valid :func: `pandas.eval ` expression is also a valid
468
+ :meth: ` DataFrame.eval ` expression, with the added benefit that you don't have to
469
+ prefix the name of the :class: ` ~pandas. DataFrame ` to the column(s) you're
470
+ interested in evaluating.
469
471
470
472
In addition, you can perform assignment of columns within an expression.
471
473
This allows for *formulaic evaluation *. Only a single assignment is permitted.
@@ -480,55 +482,75 @@ it must be a valid Python identifier.
480
482
df.eval(' a = 1' )
481
483
df
482
484
485
+ The equivalent in standard Python would be
486
+
487
+ .. ipython :: python
488
+
489
+ df = DataFrame(dict (a = range (5 ), b = range (5 , 10 )))
490
+ df[' c' ] = df.a + df.b
491
+ df[' d' ] = df.a + df.b + df.c
492
+ df[' a' ] = 1
493
+ df
494
+
483
495
Local Variables
484
496
~~~~~~~~~~~~~~~
485
497
486
- You can refer to local variables the same way you would in vanilla Python
498
+ In pandas version 0.14 the local variable API has changed. In pandas 0.13.x,
499
+ you could refer to local variables the same way you would in standard Python.
500
+ For example,
487
501
488
- .. ipython :: python
502
+ .. code-block :: python
489
503
490
504
df = DataFrame(randn(5 , 2 ), columns = [' a' , ' b' ])
491
505
newcol = randn(len (df))
492
506
df.eval(' b + newcol' )
493
507
494
- .. note ::
508
+ UndefinedVariableError: name ' newcol ' is not defined
495
509
496
- The one exception is when you have a local (or global) with the same name as
497
- a column in the ``DataFrame ``
510
+ As you can see from the exception generated, this syntax is no longer allowed.
511
+ You must *explicitly reference * any local variable that you want to use in an
512
+ expression by placing the ``@ `` character in front of the name. For example,
498
513
499
- .. code-block :: python
514
+ .. ipython :: python
500
515
501
- df = DataFrame(randn(5 , 2 ), columns = [ ' a ' , ' b ' ] )
502
- a = randn(len (df))
503
- df.eval(' a + b ' )
504
- NameResolutionError: resolvers and locals overlap on names [ ' a ' ]
516
+ df = DataFrame(randn(5 , 2 ), columns = list ( ' ab ' ) )
517
+ newcol = randn(len (df))
518
+ df.eval(' b + @newcol ' )
519
+ df.query( ' b < @newcol ' )
505
520
521
+ If you don't prefix the local variable with ``@ ``, pandas will raise an
522
+ exception telling you the variable is undefined.
506
523
507
- To deal with these conflicts, a special syntax exists for referring
508
- variables with the same name as a column
524
+ When using :meth: `DataFrame.eval ` and :meth: `DataFrame.query `, this allows you
525
+ to have a local variable and a :class: `~pandas.DataFrame ` column with the same
526
+ name in an expression.
509
527
510
- .. ipython :: python
511
- :suppress:
512
528
513
- a = randn( len (df))
529
+ .. ipython :: python
514
530
515
- .. ipython :: python
531
+ a = randn()
532
+ df.query(' @a < a' )
533
+ df.loc[a < df.a] # same as the previous expression
516
534
517
- df.eval(' @a + b' )
535
+ With :func: `pandas.eval ` you cannot use the ``@ `` prefix *at all *, because it
536
+ isn't defined in that context. ``pandas `` will let you know this if you try to
537
+ use ``@ `` in a top-level call to :func: `pandas.eval `. For example,
518
538
519
- The same is true for :meth: `~pandas.DataFrame.query `
539
+ .. ipython :: python
540
+ :okexcept:
520
541
521
- .. ipython :: python
542
+ a, b = 1 , 2
543
+ pd.eval(' @a + b' )
522
544
523
- df.query(' @a < b' )
545
+ In this case, you should simply refer to the variables like you would in
546
+ standard Python.
524
547
525
- .. ipython :: python
526
- :suppress:
548
+ .. ipython :: python
527
549
528
- del a
550
+ pd.eval( ' a + b ' )
529
551
530
552
531
- :func: `~ pandas.eval ` Parsers
553
+ :func: `pandas.eval ` Parsers
532
554
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
533
555
534
556
There are two different parsers and and two different engines you can use as
@@ -568,7 +590,7 @@ The ``and`` and ``or`` operators here have the same precedence that they would
568
590
in vanilla Python.
569
591
570
592
571
- :func: `~ pandas.eval ` Backends
593
+ :func: `pandas.eval ` Backends
572
594
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
573
595
574
596
There's also the option to make :func: `~pandas.eval ` operate identical to plain
@@ -577,12 +599,12 @@ ol' Python.
577
599
.. note ::
578
600
579
601
Using the ``'python' `` engine is generally *not * useful, except for testing
580
- other :func: `~pandas.eval ` engines against it. You will acheive **no **
581
- performance benefits using :func: `~pandas.eval ` with ``engine='python' ``.
602
+ other evaluation engines against it. You will acheive **no ** performance
603
+ benefits using :func: `~pandas.eval ` with ``engine='python' `` and in fact may
604
+ incur a performance hit.
582
605
583
- You can see this by using :func: `~pandas.eval ` with the ``'python' `` engine is
584
- actually a bit slower (not by much) than evaluating the same expression in
585
- Python:
606
+ You can see this by using :func: `pandas.eval ` with the ``'python' `` engine. It
607
+ is a bit slower (not by much) than evaluating the same expression in Python
586
608
587
609
.. ipython :: python
588
610
@@ -593,15 +615,15 @@ Python:
593
615
% timeit pd.eval(' df1 + df2 + df3 + df4' , engine = ' python' )
594
616
595
617
596
- :func: `~ pandas.eval ` Performance
618
+ :func: `pandas.eval ` Performance
597
619
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
598
620
599
621
:func: `~pandas.eval ` is intended to speed up certain kinds of operations. In
600
622
particular, those operations involving complex expressions with large
601
- `` DataFrame ``/`` Series `` objects should see a significant performance benefit.
602
- Here is a plot showing the running time of :func: ` ~pandas.eval ` as function of
603
- the size of the frame involved in the computation. The two lines are two
604
- different engines.
623
+ :class: ` ~pandas. DataFrame `/ :class: ` ~pandas. Series ` objects should see a
624
+ significant performance benefit. Here is a plot showing the running time of
625
+ :func: ` pandas.eval ` as function of the size of the frame involved in the
626
+ computation. The two lines are two different engines.
605
627
606
628
607
629
.. image :: _static/eval-perf.png
@@ -618,19 +640,31 @@ different engines.
618
640
This plot was created using a ``DataFrame `` with 3 columns each containing
619
641
floating point values generated using ``numpy.random.randn() ``.
620
642
621
- Technical Minutia
622
- ~~~~~~~~~~~~~~~~~
623
- - Expressions that would result in an object dtype (including simple
624
- variable evaluation) have to be evaluated in Python space. The main reason
625
- for this behavior is to maintain backwards compatbility with versions of
626
- numpy < 1.7. In those versions of ``numpy `` a call to ``ndarray.astype(str) ``
627
- will truncate any strings that are more than 60 characters in length. Second,
628
- we can't pass ``object `` arrays to ``numexpr `` thus string comparisons must
629
- be evaluated in Python space.
630
- - The upshot is that this *only * applies to object-dtype'd expressions. So,
631
- if you have an expression--for example--that's a string comparison
632
- ``and ``-ed together with another boolean expression that's from a numeric
633
- comparison, the numeric comparison will be evaluated by ``numexpr ``. In fact,
634
- in general, :func: `~pandas.query `/:func: `~pandas.eval ` will "pick out" the
635
- subexpressions that are ``eval ``-able by ``numexpr `` and those that must be
636
- evaluated in Python space transparently to the user.
643
+ Technical Minutia Regarding Expression Evaluation
644
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
645
+
646
+ Expressions that would result in an object dtype or involve datetime operations
647
+ (because of ``NaT ``) must be evaluated in Python space. The main reason for
648
+ this behavior is to maintain backwards compatbility with versions of numpy <
649
+ 1.7. In those versions of ``numpy `` a call to ``ndarray.astype(str) `` will
650
+ truncate any strings that are more than 60 characters in length. Second, we
651
+ can't pass ``object `` arrays to ``numexpr `` thus string comparisons must be
652
+ evaluated in Python space.
653
+
654
+ The upshot is that this *only * applies to object-dtype'd expressions. So, if
655
+ you have an expression--for example
656
+
657
+ .. ipython :: python
658
+
659
+ df = DataFrame({' strings' : np.repeat(list (' cba' ), 3 ),
660
+ ' nums' : np.repeat(range (3 ), 3 )})
661
+ df
662
+ df.query(' strings == "a" and nums == 1' )
663
+
664
+ the numeric part of the comparison (``nums == 1 ``) will be evaluated by
665
+ ``numexpr ``.
666
+
667
+ In general, :meth: `DataFrame.query `/:func: `pandas.eval ` will
668
+ evaluate the subexpressions that *can * be evaluated by ``numexpr `` and those
669
+ that must be evaluated in Python space transparently to the user. This is done
670
+ by inferring the result type of an expression from its arguments and operators.
0 commit comments