Skip to content

Int64 can't be used in pandas.eval() #29618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
antonov-andrey opened this issue Nov 14, 2019 · 5 comments · Fixed by #50764
Closed

Int64 can't be used in pandas.eval() #29618

antonov-andrey opened this issue Nov 14, 2019 · 5 comments · Fixed by #50764
Labels
Enhancement expressions pd.eval, query ExtensionArray Extending pandas with custom dtypes or arrays. NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@antonov-andrey
Copy link

This code works fine

import pandas as pd

d = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'], dtype="Float64")
d.eval('c = b - a', inplace=True)

This code fails with the following stacktrace below

d = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'], dtype="Int64")
d.eval('c = b - a', inplace=True)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-42645933ba81> in <module>
----> 1 d.eval('c = b - a', inplace=True)
      2 d

~/.local/lib/python3.6/site-packages/pandas/core/frame.py in eval(self, expr, inplace, **kwargs)
   3313             kwargs["target"] = self
   3314         kwargs["resolvers"] = kwargs.get("resolvers", ()) + tuple(resolvers)
-> 3315         return _eval(expr, inplace=inplace, **kwargs)
   3316 
   3317     def select_dtypes(self, include=None, exclude=None):

~/.local/lib/python3.6/site-packages/pandas/core/computation/eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
    320         )
    321 
--> 322         parsed_expr = Expr(expr, engine=engine, parser=parser, env=env, truediv=truediv)
    323 
    324         # construct the engine and evaluate the parsed expression

~/.local/lib/python3.6/site-packages/pandas/core/computation/expr.py in __init__(self, expr, engine, parser, env, truediv, level)
    828         self.env.scope["truediv"] = truediv
    829         self._visitor = _parsers[parser](self.env, self.engine, self.parser)
--> 830         self.terms = self.parse()
    831 
    832     @property

~/.local/lib/python3.6/site-packages/pandas/core/computation/expr.py in parse(self)
    845     def parse(self):
    846         """Parse an expression"""
--> 847         return self._visitor.visit(self.expr)
    848 
    849     @property

~/.local/lib/python3.6/site-packages/pandas/core/computation/expr.py in visit(self, node, **kwargs)
    439         method = "visit_" + node.__class__.__name__
    440         visitor = getattr(self, method)
--> 441         return visitor(node, **kwargs)
    442 
    443     def visit_Module(self, node, **kwargs):

~/.local/lib/python3.6/site-packages/pandas/core/computation/expr.py in visit_Module(self, node, **kwargs)
    445             raise SyntaxError("only a single expression is allowed")
    446         expr = node.body[0]
--> 447         return self.visit(expr, **kwargs)
    448 
    449     def visit_Expr(self, node, **kwargs):

~/.local/lib/python3.6/site-packages/pandas/core/computation/expr.py in visit(self, node, **kwargs)
    439         method = "visit_" + node.__class__.__name__
    440         visitor = getattr(self, method)
--> 441         return visitor(node, **kwargs)
    442 
    443     def visit_Module(self, node, **kwargs):

~/.local/lib/python3.6/site-packages/pandas/core/computation/expr.py in visit_Assign(self, node, **kwargs)
    663             )
    664 
--> 665         return self.visit(node.value, **kwargs)
    666 
    667     def visit_Attribute(self, node, **kwargs):

~/.local/lib/python3.6/site-packages/pandas/core/computation/expr.py in visit(self, node, **kwargs)
    439         method = "visit_" + node.__class__.__name__
    440         visitor = getattr(self, method)
--> 441         return visitor(node, **kwargs)
    442 
    443     def visit_Module(self, node, **kwargs):

~/.local/lib/python3.6/site-packages/pandas/core/computation/expr.py in visit_BinOp(self, node, **kwargs)
    563         op, op_class, left, right = self._maybe_transform_eq_ne(node)
    564         left, right = self._maybe_downcast_constants(left, right)
--> 565         return self._maybe_evaluate_binop(op, op_class, left, right)
    566 
    567     def visit_Div(self, node, **kwargs):

~/.local/lib/python3.6/site-packages/pandas/core/computation/expr.py in _maybe_evaluate_binop(self, op, op_class, lhs, rhs, eval_in_python, maybe_eval_in_python)
    531         res = op(lhs, rhs)
    532 
--> 533         if res.has_invalid_return_type:
    534             raise TypeError(
    535                 "unsupported operand type(s) for {op}:"

~/.local/lib/python3.6/site-packages/pandas/core/computation/ops.py in has_invalid_return_type(self)
    223         types = self.operand_types
    224         obj_dtype_set = frozenset([np.dtype("object")])
--> 225         return self.return_type == object and types - obj_dtype_set
    226 
    227     @property

~/.local/lib/python3.6/site-packages/pandas/core/computation/ops.py in return_type(self)
    217         if self.op in (_cmp_ops_syms + _bool_ops_syms):
    218             return np.bool_
--> 219         return _result_type_many(*(term.type for term in com.flatten(self)))
    220 
    221     @property

~/.local/lib/python3.6/site-packages/pandas/core/computation/common.py in _result_type_many(*arrays_and_dtypes)
     20     argument limit """
     21     try:
---> 22         return np.result_type(*arrays_and_dtypes)
     23     except ValueError:
     24         # we have > NPY_MAXARGS terms in our expression

<__array_function__ internals> in result_type(*args, **kwargs)

TypeError: data type not understood

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.6.8.final.0 python-bits : 64 OS : Linux OS-release : 5.0.0-35-generic machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 9.0.1
setuptools : 41.6.0
Cython : 0.29.14
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 0.999999999
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : None
bottleneck : 1.3.0
fastparquet : 0.3.2
gcsfs : None
lxml.etree : None
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
s3fs : 0.3.5
scipy : 1.3.2
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@jreback
Copy link
Contributor

jreback commented Nov 14, 2019

numexpr has 0 support for extension types and will likely never have. should just shift this to using the ```engine='python'`` for extension dtypes. a community PR would be needed here.

@antonov-andrey
Copy link
Author

Thank you for the fast reply! But when I try to use it with engine='python':
d.eval('c = b - a', inplace=True, engine='python')
it do not work exactly the same way as I described.

@jorisvandenbossche jorisvandenbossche added the ExtensionArray Extending pandas with custom dtypes or arrays. label Nov 15, 2019
@jbrockmendel jbrockmendel added the expressions pd.eval, query label Feb 25, 2020
@stefansimik
Copy link

stefansimik commented May 19, 2020

Same problem here.
engine='python' does not help.

Float - works fine

(
    pd.DataFrame({'A': [1, 2, np.NaN], 'B': [1, np.NaN, 3]})
    .astype('float')
    .eval('C = A + B', engine='python')
)

Int64 - Fails:

(
    pd.DataFrame({'A': [1, 2, np.NaN], 'B': [1, np.NaN, 3]})
    .astype('Int64')                 #  <--- nullable integer causes fail in following eval(..)
    .eval('C = A + B', engine='python')
)

@evyasonov
Copy link

Any update?

@jreback
Copy link
Contributor

jreback commented Nov 10, 2020

@evyasonov you are welcome to contribute a patch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement expressions pd.eval, query ExtensionArray Extending pandas with custom dtypes or arrays. NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants