Skip to content

BUG: Fix bug when using df.query with "str.contains()" #25813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

dtpc
Copy link

@dtpc dtpc commented Mar 21, 2019

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Fixes bug in on line 588 of pandas/core/computation/expr.py: AttributeError: 'dict' object has no attribute 'append'

Also addresses the specific example in #22172

Example:

import pandas as pd
df = pd.DataFrame([["I", "XYZ"], ["IJ", None]], columns=['A', 'B'])
df.query("B.str.contains('Z', na=False)", engine="python")
Traceback (most recent call last):
  File "./test_query.py", line 12, in <module>
    result = df.query("B.str.contains('Z', na=False)", engine="python")
  File "./lib/python3.6/site-packages/pandas/core/frame.py", line 2847, in query
    res = self.eval(expr, **kwargs)
  File "./lib/python3.6/site-packages/pandas/core/frame.py", line 2962, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "./lib/python3.6/site-packages/pandas/core/computation/eval.py", line 291, in eval
    truediv=truediv)
  File "./lib/python3.6/site-packages/pandas/core/computation/expr.py", line 739, in __init__
    self.terms = self.parse()
  File "./lib/python3.6/site-packages/pandas/core/computation/expr.py", line 756, in parse
    return self._visitor.visit(self.expr)
  File "./lib/python3.6/site-packages/pandas/core/computation/expr.py", line 321, in visit
    return visitor(node, **kwargs)
  File "./lib/python3.6/site-packages/pandas/core/computation/expr.py", line 327, in visit_Module
    return self.visit(expr, **kwargs)
  File "./lib/python3.6/site-packages/pandas/core/computation/expr.py", line 321, in visit
    return visitor(node, **kwargs)
  File "./lib/python3.6/site-packages/pandas/core/computation/expr.py", line 330, in visit_Expr
    return self.visit(node.value, **kwargs)
  File "./lib/python3.6/site-packages/pandas/core/computation/expr.py", line 321, in visit
    return visitor(node, **kwargs)
  File "./lib/python3.6/site-packages/pandas/core/computation/expr.py", line 588, in visit_Call_35
    kwargs.append(ast.keyword(
AttributeError: 'dict' object has no attribute 'append'

@codecov
Copy link

codecov bot commented Mar 21, 2019

Codecov Report

Merging #25813 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25813      +/-   ##
==========================================
+ Coverage   91.27%   91.27%   +<.01%     
==========================================
  Files         173      173              
  Lines       53002    53002              
==========================================
+ Hits        48375    48378       +3     
+ Misses       4627     4624       -3
Flag Coverage Δ
#multiple 89.85% <100%> (+0.01%) ⬆️
#single 41.77% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/computation/expr.py 89.27% <100%> (+0.74%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fbe2523...64004bc. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 21, 2019

Codecov Report

Merging #25813 into master will decrease coverage by 0.62%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25813      +/-   ##
==========================================
- Coverage   91.89%   91.27%   -0.63%     
==========================================
  Files         175      173       -2     
  Lines       52509    53002     +493     
==========================================
+ Hits        48255    48378     +123     
- Misses       4254     4624     +370
Flag Coverage Δ
#multiple 89.85% <100%> (-0.6%) ⬇️
#single 41.77% <0%> (+0.89%) ⬆️
Impacted Files Coverage Δ
pandas/core/computation/expr.py 89.27% <100%> (-7.43%) ⬇️
pandas/io/gbq.py 25% <0%> (-62.5%) ⬇️
pandas/compat/__init__.py 58.03% <0%> (-19.47%) ⬇️
pandas/io/common.py 72.86% <0%> (-18.97%) ⬇️
pandas/io/excel/_util.py 78.82% <0%> (-8.68%) ⬇️
pandas/compat/chainmap.py 61.9% <0%> (-4.77%) ⬇️
pandas/core/groupby/categorical.py 95.45% <0%> (-4.55%) ⬇️
pandas/core/dtypes/cast.py 88.16% <0%> (-3.2%) ⬇️
pandas/io/s3.py 86.36% <0%> (-3.12%) ⬇️
... and 115 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a15069...223e034. Read the comment docs.

@dtpc dtpc changed the title Fix/eval str contains BUG: Fix bug when using df.query with "str.contains()" Mar 21, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a whatsnew note for v0.25?

@WillAyd WillAyd added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff DataFrame DataFrame data structure Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 21, 2019
@dtpc dtpc force-pushed the fix/eval_str_contains branch from 233a26b to 087157c Compare March 21, 2019 03:31
@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Mar 22, 2019
@WillAyd
Copy link
Member

WillAyd commented Apr 2, 2019

@dtpc can you merge master and address comments?

@jreback
Copy link
Contributor

jreback commented Apr 5, 2019

can you merge master and update

@@ -412,6 +412,7 @@ Other

- Improved :class:`Timestamp` type checking in various datetime functions to prevent exceptions when using a subclassed `datetime` (:issue:`25851`)
- Bug in :class:`Series` and :class:`DataFrame` repr where ``np.datetime64('NaT')`` and ``np.timedelta64('NaT')`` with ``dtype=object`` would be represented as ``NaN`` (:issue:`25445`)
- Bug in :class:`BaseExprVisitor` which caused :func:`eval` expressions to fail when named keyword arguments were included within the expression string (:issue:`25813`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needs a little work to make this user facing, i.e. only references items in the documented API (which BaseExprVisitor is not).

Would be better referencing the DataFrame.query method

@@ -1757,6 +1757,29 @@ def test_no_new_globals(self, engine, parser):
assert gbls == gbls2


class TestEvalNamedKWargs(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a class here; OK to remove this and put tests at top level of module

df = pd.DataFrame([["I", "XYZ"], ["IJ", None]], columns=['A', 'B'])

expected = df[df["A"].str.contains("J")]
result = df.query("A.str.contains('J')", engine="python",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not parametrize this for all engines?

Copy link
Member

@gfyoung gfyoung May 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because engine=pandas fails for this test (though that should have been explicit).

It's tries to hash a Series object.

@@ -1757,6 +1757,29 @@ def test_no_new_globals(self, engine, parser):
assert gbls == gbls2


class TestEvalNamedKWargs(object):
# xref https://github.com/pandas-dev/pandas/issues/25813
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add #GH 25813 to the first line of the tests added here

@jreback
Copy link
Contributor

jreback commented May 12, 2019

@pandas-dev/pandas-core if someone wants to finish up

@jreback
Copy link
Contributor

jreback commented May 19, 2019

this PR is superseded by #26426

@jreback jreback closed this May 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFrame DataFrame data structure Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants