Skip to content

DataFrame.eval errors with AttributeError: 'UnaryOp' #16363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brentp opened this issue May 16, 2017 · 23 comments · Fixed by #25928
Closed

DataFrame.eval errors with AttributeError: 'UnaryOp' #16363

brentp opened this issue May 16, 2017 · 23 comments · Fixed by #25928
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@brentp
Copy link

brentp commented May 16, 2017

Code Sample, a copy-pastable example if possible

def test_unary():
    df = pd.DataFrame({'x': np.array([0.11, 0], dtype=np.float32)})
    res = df.eval('(x > 0.1) | (x < -0.1)')
    assert np.array_equal(res, np.array([True, False])), res

Problem description

This is related to #11235.
on python 3.6, pandas 20.1, this raises an error the traceback ends with:

  File ".../envs/py3/lib/python3.6/site-packages/pandas/core/computation/expr.py", line 370, in _maybe_downcast_constants
    name = self.env.add_tmp(np.float32(right.value))
AttributeError: 'UnaryOp' object has no attribute 'value'

In that case the right is -(0.1)

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.8.0-49-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

Another example:

>>> df = pd.DataFrame({'x':[1,2,3,4,5]})
>>> df.eval('x.shift(-1)')
@jreback jreback added Bug Difficulty Intermediate Numeric Operations Arithmetic, Comparison, and Logical operations labels May 16, 2017
@jreback jreback added this to the Next Major Release milestone May 16, 2017
@james-nichols
Copy link

I am looking at this as part of the PyCon2017 sprints

@mkozel92
Copy link

Not really a fix. But if you need a workaround just use float64.
Worked for me.

@james-nichols
Copy link

Using float64 does not work for me, and in any case does not address that attribute value is being sought from UnaryOp.

Left the sprints early, but looked in to this and realised I don't understand the Pandas Op class behaviour well enough.

The problem is that UnaryOp returns True for isscalar, which on first inspection seems a little strange. Also any descendent of Op (e.g. BinaryOp) also returns True for isscalar, in similar circumstances. This is because of the following in the Op class:

@property
def isscalar(self):
    return all(operand.isscalar for operand in self.operands)

Seems like incorrect behaviour to me. If I make isscalar simply return False, then the problem here is fixed, but I have little idea of the far reaching consequences of such a change. I searched for all references to isscalar through the core code-base and it seems that it is only called in this method and one other, so perhaps there is little problem.

Does anyone have any thoughts on this?

@james-nichols
Copy link

I've run the test suite with isscalar set to False in the Op class, and it doesn't seem to break anything. In my opinion I think someone got the notion of what a scalar in this case confused with the notion of a scalar in terms of numpy arrays, somewhere along the way. I think only objects of type Term and descendants should return True for isscalar.

Any thoughts?

@kenahoo
Copy link

kenahoo commented Jul 5, 2017

A smaller version of the original test case is:

def test_unary():
    df = pd.DataFrame({'x': np.array([0], dtype=np.float32)})
    res = df.eval('x < -0.1')
    assert np.array_equal(res, np.array([False])), res

Note that it's not just a problem with np.float32, it also fails with string data (which is my original use case that motivated #16833):

def test_unary():
    df = pd.DataFrame({'x': ["one", "two"]})
    df.eval('x.shift(-1)')

@james-nichols
Copy link

Agreed. It is not just np.float32 that is causing the trouble.

I think that my suggested fix is the correct way forward, having run the full test suite and seen no problems, and thinking about how the design notionally should work. I believe someone got confused with the notion of isscalar from numpy - that an expression shouldn't be considered a "scalar" just because it returns scalar values as opposed to array/list values, versus the idea here which should be a test whether the expression is actually a scalar as opposed to an expression that could be further broken down or an op.

@ksw9
Copy link

ksw9 commented Jan 20, 2018

Hi,
I am wondering if this is resolved? I'm running into a similar issue using pandas df.query() with negative numbers.
Thank you!

@james-nichols
Copy link

@ksw9 I'll submit a fix for this. That way at least a moderator will have to respond.

@ksw9
Copy link

ksw9 commented Jan 22, 2018

Great, thank you!

@ksw9
Copy link

ksw9 commented Jan 25, 2018

Would it be possible to update this thread if this has been fixed? Thanks again!

@vmuriart
Copy link

@james-nichols there might be a problem with your approach though. It seems doing your change would completely skip over this section of code which would downcast the type of the unary term to float32 and would result in a series of dtype of float32. With your changes the result would be of dtype of float64.

With the silly fix I suggested in #19697 (self.value = operand.value), the return type would be float32 which seems what was intended, but the results are wrong (the negative is ignored)

Neither though seems to solve #16833. Setting the isscalar to False would just push the error further down the line. Add self.value = operand.value pushes the code further along and it will instead error out with TypeError: 'Series' objects are mutable, thus they cannot be hashed

@alexcwatt
Copy link
Contributor

I ran into this recently and would like to help with a patch. As best I can tell, the problem is that _maybe_downcast_constants not only tries to downcast constants but also UnaryOp's, which isn't possible, since UnaryOp instances don't have a value attribute like constants/scalars do.

I am new to the pandas code, and the expressions code is a bit tricky, but I think we could catch the AttributeError in _maybe_downcast_constants or explicitly check in each case that left or right has the attribute value.

In short, the problem is that an operation like df.eval(x < -.1) fails when x is a np.float32 because the right side of the equation is seen as a UnaryOp node instead of as a np.float32 and is subjected to _maybe_downcast_constants by visit_BinOp. OTOH, df.eval(x < @y) works when y = -.1, because pandas doesn't have to parse it. I think a small change might fix this, but I could be overlooking something bigger and would appreciate feedback.

@eyadsibai
Copy link

I just wanted to mention that this issue still remains in 0.24.1. I just ran into it.

@jreback
Copy link
Contributor

jreback commented Feb 26, 2019

best way to fix is to submit a PR

there are 2800 other issues

alexcwatt added a commit to alexcwatt/pandas that referenced this issue Mar 30, 2019
alexcwatt added a commit to alexcwatt/pandas that referenced this issue Mar 30, 2019
@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Mar 30, 2019
yhaque1213 pushed a commit to yhaque1213/pandas that referenced this issue Apr 22, 2019
@sunt05
Copy link

sunt05 commented Jan 7, 2021

This issues seems to persist when .between is used inside query.
Also, the example of OP also fails:

df = pd.DataFrame({'x':[1,2,3,4,5]})
df.eval('x.shift(-1)')

tested with pandas v1.2.

version info INSTALLED VERSIONS ------------------ commit : 3e89b4c python : 3.8.6.final.0 python-bits : 64 OS : Darwin OS-release : 20.2.0 Version : Darwin Kernel Version 20.2.0: Wed Dec 2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.2.0
numpy : 1.19.1
pytz : 2020.5
dateutil : 2.8.1
pip : 20.2.1
setuptools : 49.6.0.post20201009
Cython : None
pytest : 6.0.1
hypothesis : None
sphinx : 3.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.17.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 0.8.0
fastparquet : None
gcsfs : None
matplotlib : 3.3.0
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : 0.16.2
xlrd : None
xlwt : None
numba : None

@SBFRF
Copy link

SBFRF commented Jan 21, 2022

I just ran into this. It seems like this ticket is still open and unresolved. I don't totally understand any of the above, but any update would be awesome!

@jreback
Copy link
Contributor

jreback commented Jan 21, 2022

@SBFRF this was closed a long time ago if u have an issue pls open a new ticket with a reproducible example

@kenahoo
Copy link

kenahoo commented Jan 21, 2022

@jreback the 2nd example in the original submission above (same as @sunt05's example) is still broken in Pandas 1.3.5:

% python
Python 3.9.5 (default, Jan 14 2022, 17:56:29) 
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'1.3.5'
>>> df = pd.DataFrame({'x':[1,2,3,4,5]})
>>> df.eval('x.shift(-1)')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kwilliams/.pyenv/versions/load-fx/lib/python3.9/site-packages/pandas/core/frame.py", line 4191, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/Users/kwilliams/.pyenv/versions/load-fx/lib/python3.9/site-packages/pandas/core/computation/eval.py", line 348, in eval
    parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
  File "/Users/kwilliams/.pyenv/versions/load-fx/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 806, in __init__
    self.terms = self.parse()
  File "/Users/kwilliams/.pyenv/versions/load-fx/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 825, in parse
    return self._visitor.visit(self.expr)
  File "/Users/kwilliams/.pyenv/versions/load-fx/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 411, in visit
    return visitor(node, **kwargs)
  File "/Users/kwilliams/.pyenv/versions/load-fx/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 417, in visit_Module
    return self.visit(expr, **kwargs)
  File "/Users/kwilliams/.pyenv/versions/load-fx/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 411, in visit
    return visitor(node, **kwargs)
  File "/Users/kwilliams/.pyenv/versions/load-fx/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 420, in visit_Expr
    return self.visit(node.value, **kwargs)
  File "/Users/kwilliams/.pyenv/versions/load-fx/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 411, in visit
    return visitor(node, **kwargs)
  File "/Users/kwilliams/.pyenv/versions/load-fx/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 692, in visit_Call
    new_args = [self.visit(arg).value for arg in node.args]
  File "/Users/kwilliams/.pyenv/versions/load-fx/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 692, in <listcomp>
    new_args = [self.visit(arg).value for arg in node.args]
AttributeError: 'UnaryOp' object has no attribute 'value'

If this were my tracker I'd want to keep the context of this ticket instead of opening a new ticket, up to you though.

@jreback
Copy link
Contributor

jreback commented Jan 21, 2022

@kenahoo there are 3000 issues and very very few people looking at this whom are all volunteers

@kenahoo
Copy link

kenahoo commented Jan 21, 2022

@jreback I understand. Just trying to keep the queue informative, and prevent creating issue 3001 if possible, when there's a lot of valuable information already in this existing ticket.

Comes down to policy, I guess - if a ticket was mistakenly closed but the code was never actually fixed, do you prefer to re-open it or create a new ticket?

@jreback
Copy link
Contributor

jreback commented Jan 21, 2022

this is not mistakenly closed
it has a validation test

@kenahoo
Copy link

kenahoo commented Jan 21, 2022

Jeff, I'm trying to be as helpful as I can possibly be.

There were two demonstrated bugs in this ticket.

The first case was fixed with a validation test.

The second case has not been fixed and still exists in Pandas 1.3.5.

We're not really getting anywhere in this ticket, so I'll propose re-opening #16833 instead.

Thanks for your time.

@jreback
Copy link
Contributor

jreback commented Jan 22, 2022

@kenahoo pls don't propose reopening its not going to happen
simple create a new issue with a copy pastable example

a pull request to fix would be helpful

amoghavs pushed a commit to amoghavs/HolisticTraceAnalysis that referenced this issue Mar 13, 2024
Summary:
Without this diff, the critical_path_analysis on some traces fails with the error: `'UnaryOp' object has no attribute 'evaluate'`

This is apparently a pandas bug: pandas-dev/pandas#16363, which has been solved a few times, but appears in different forms time, and again. 

Reframing the query is solving the problem in critical_path_analysis for the time being.

Differential Revision: D54828775
amoghavs pushed a commit to amoghavs/HolisticTraceAnalysis that referenced this issue Mar 13, 2024
Summary:

Without this diff, the critical_path_analysis on some traces fails with the error: `'UnaryOp' object has no attribute 'evaluate'`

This is apparently a pandas bug: pandas-dev/pandas#16363, which has been solved a few times, but appears in different forms time, and again. 

Reframing the query is solving the problem in critical_path_analysis for the time being.

Differential Revision: D54828775
amoghavs pushed a commit to amoghavs/HolisticTraceAnalysis that referenced this issue Mar 13, 2024
Summary:

Without this diff, the critical_path_analysis on some traces fails with the error: `'UnaryOp' object has no attribute 'evaluate'`

This is apparently a pandas bug: pandas-dev/pandas#16363, which has been solved a few times, but appears in different forms time, and again. 

Reframing the query is solving the problem in critical_path_analysis for the time being.

Reviewed By: briancoutinho

Differential Revision: D54828775
facebook-github-bot pushed a commit to facebookresearch/HolisticTraceAnalysis that referenced this issue Mar 13, 2024
Summary:
Pull Request resolved: #112

Without this diff, the critical_path_analysis on some traces fails with the error: `'UnaryOp' object has no attribute 'evaluate'`

This is apparently a pandas bug: pandas-dev/pandas#16363, which has been solved a few times, but appears in different forms time, and again.

Reframing the query is solving the problem in critical_path_analysis for the time being.

Reviewed By: briancoutinho

Differential Revision: D54828775

fbshipit-source-id: 0bb52d1da7b5b143949c464b453955a666d31581
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.