Skip to content

Commit 0898f39

Browse files
cpcloudjreback
authored andcommitted
BUG: allow lex string comparisons
1 parent 733d9c9 commit 0898f39

File tree

4 files changed

+49
-9
lines changed

4 files changed

+49
-9
lines changed

doc/source/enhancingperf.rst

+26-8
Original file line numberDiff line numberDiff line change
@@ -464,19 +464,20 @@ evaluate an expression in the "context" of a ``DataFrame``.
464464
465465
Any expression that is a valid :func:`~pandas.eval` expression is also a valid
466466
``DataFrame.eval`` expression, with the added benefit that *you don't have to
467-
prefix the name of the* ``DataFrame`` *to the column you're interested in
467+
prefix the name of the* ``DataFrame`` *to the column(s) you're interested in
468468
evaluating*.
469469

470-
In addition, you can perform in-line assignment of columns within an expression.
471-
This can allow for *formulaic evaluation*. Only a signle assignement is permitted.
472-
It can be a new column name or an existing column name. It must be a string-like.
470+
In addition, you can perform assignment of columns within an expression.
471+
This allows for *formulaic evaluation*. Only a single assignment is permitted.
472+
The assignment target can be a new column name or an existing column name, and
473+
it must be a valid Python identifier.
473474

474475
.. ipython:: python
475476
476-
df = DataFrame(dict(a = range(5), b = range(5,10)))
477-
df.eval('c=a+b')
478-
df.eval('d=a+b+c')
479-
df.eval('a=1')
477+
df = DataFrame(dict(a=range(5), b=range(5, 10)))
478+
df.eval('c = a + b')
479+
df.eval('d = a + b + c')
480+
df.eval('a = 1')
480481
df
481482
482483
Local Variables
@@ -616,3 +617,20 @@ different engines.
616617

617618
This plot was created using a ``DataFrame`` with 3 columns each containing
618619
floating point values generated using ``numpy.random.randn()``.
620+
621+
Technical Minutia
622+
~~~~~~~~~~~~~~~~~
623+
- Expressions that would result in an object dtype (including simple
624+
variable evaluation) have to be evaluated in Python space. The main reason
625+
for this behavior is to maintain backwards compatbility with versions of
626+
numpy < 1.7. In those versions of ``numpy`` a call to ``ndarray.astype(str)``
627+
will truncate any strings that are more than 60 characters in length. Second,
628+
we can't pass ``object`` arrays to ``numexpr`` thus string comparisons must
629+
be evaluated in Python space.
630+
- The upshot is that this *only* applies to object-dtype'd expressions. So,
631+
if you have an expression--for example--that's a string comparison
632+
``and``-ed together with another boolean expression that's from a numeric
633+
comparison, the numeric comparison will be evaluated by ``numexpr``. In fact,
634+
in general, :func:`~pandas.query`/:func:`~pandas.eval` will "pick out" the
635+
subexpressions that are ``eval``-able by ``numexpr`` and those that must be
636+
evaluated in Python space transparently to the user.

doc/source/release.rst

+2
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,8 @@ Bug Fixes
168168
- Bug in DataFrame construction with recarray and non-ns datetime dtype (:issue:`6140`)
169169
- Bug in ``.loc`` setitem indexing with a datafrme on rhs, multiple item setting, and
170170
a datetimelike (:issue:`6152`)
171+
- Fixed a stack overflow bug in ``query``/``eval`` during lexicographic
172+
string comparisons (:issue:`6155`).
171173

172174
pandas 0.13.0
173175
-------------

pandas/computation/expr.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -508,7 +508,8 @@ def _possibly_eval(self, binop, eval_in_python):
508508

509509
def _possibly_evaluate_binop(self, op, op_class, lhs, rhs,
510510
eval_in_python=('in', 'not in'),
511-
maybe_eval_in_python=('==', '!=')):
511+
maybe_eval_in_python=('==', '!=', '<', '>',
512+
'<=', '>=')):
512513
res = op(lhs, rhs)
513514

514515
if self.engine != 'pytables':

pandas/tests/test_frame.py

+19
Original file line numberDiff line numberDiff line change
@@ -12841,6 +12841,25 @@ def test_query_with_nested_string(self):
1284112841
for parser, engine in product(PARSERS, ENGINES):
1284212842
yield self.check_query_with_nested_strings, parser, engine
1284312843

12844+
def check_query_lex_compare_strings(self, parser, engine):
12845+
tm.skip_if_no_ne(engine=engine)
12846+
import operator as opr
12847+
12848+
a = Series(tm.choice(list('abcde'), 20))
12849+
b = Series(np.arange(a.size))
12850+
df = DataFrame({'X': a, 'Y': b})
12851+
12852+
ops = {'<': opr.lt, '>': opr.gt, '<=': opr.le, '>=': opr.ge}
12853+
12854+
for op, func in ops.items():
12855+
res = df.query('X %s "d"' % op, engine=engine, parser=parser)
12856+
expected = df[func(df.X, 'd')]
12857+
assert_frame_equal(res, expected)
12858+
12859+
def test_query_lex_compare_strings(self):
12860+
for parser, engine in product(PARSERS, ENGINES):
12861+
yield self.check_query_lex_compare_strings, parser, engine
12862+
1284412863
class TestDataFrameEvalNumExprPandas(tm.TestCase):
1284512864

1284612865
@classmethod

0 commit comments

Comments
 (0)