Skip to content

Extension dtypes do not work with dataframe query method #25369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
techvslife opened this issue Feb 19, 2019 · 8 comments
Open

Extension dtypes do not work with dataframe query method #25369

techvslife opened this issue Feb 19, 2019 · 8 comments
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas expressions pd.eval, query ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@techvslife
Copy link

techvslife commented Feb 19, 2019

In the latest pandas version (0.24.1), extension dtypes do not work with numexpr (e.g. the “query” method on dataframes fails when the engine parameter is set to the default ):

https://stackoverflow.com/questions/54759936/extension-dtypes-in-pandas-appear-to-have-a-bug-with-query

Code to reproduce:
df_test = pd.DataFrame(data=[4,5,6], columns=["col_test"])
df_test = df_test.astype(dtype={"col_test": pd.Int32Dtype()})
df_test.query("col_test != 6")
Last lines of the long error message are:

File "...\site_packages\numexpr\necompiler.py", line 822, in evaluate zip(names, arguments)] File "...\site_packages\numexpr\necompiler.py", line 821, in signature = [(name, getType(arg)) for (name, arg) in File "...\site_packages\numexpr\necompiler.py", line 703, in getType raise ValueError("unknown type %s" % a.dtype.name) ValueError: unknown type object

@jreback
Copy link
Contributor

jreback commented Feb 19, 2019

you would have to explicitly add support; this is not a bug rather an enhancement requests

@techvslife
Copy link
Author

techvslife commented Feb 19, 2019 via email

@techvslife
Copy link
Author

techvslife commented Feb 20, 2019

fyi at numexpr, was mentioned that pandas may want to raise an error on unrecognized or unsupported dtypes in the query method rather than let it travel through. or could auto-fallback to pandas engine so as not to cause errors at all. but I assume there would be a downside to doing that check in adding overhead. —so not necessarily recommending but only mentioning it for consideration. Thanks.

@gfyoung gfyoung added Enhancement Dtype Conversions Unexpected or buggy dtype conversions Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays. labels Feb 21, 2019
@gfyoung
Copy link
Member

gfyoung commented Feb 21, 2019

but I assume there would be a downside to doing that check in adding overhead. —so not necessarily recommending but only mentioning it for consideration

Better error messages wouldn't hurt I think. Feel free to try implementing that.

@gfyoung gfyoung added the Error Reporting Incorrect or improved errors from pandas label Feb 21, 2019
@teto
Copy link

teto commented Feb 21, 2019

very annoying. Is there a way to setup the eval engine globally? set 'python' as default instead of numexpr ? Right now I have to build environments without numexpr to force panda to fall back on 'python' :s

@teto
Copy link

teto commented Feb 21, 2019

For a moment I thought I had found the graal with pd.set_option('compute.use_numexpr', False) but this doesn't disable numexpr in query/eval it seems so I still have to change my code to add engine="python" ?

teto added a commit to teto/pymptcpanalyzer that referenced this issue Mar 11, 2019
because of a bug in pandas/numexpr. Once fixed:
pandas-dev/pandas#25369
revert the change
teto added a commit to teto/pymptcpanalyzer that referenced this issue Apr 16, 2019
because of a bug in pandas/numexpr. Once fixed:
pandas-dev/pandas#25369
revert the change
@jbrockmendel jbrockmendel added the expressions pd.eval, query label Oct 22, 2019
@teto
Copy link

teto commented Mar 4, 2020

I checked if this was fixed with pandas 1.0 but it isn't. What I am still puzzled by and wonder if it warrants a new issue is that I disable numexpr via pd.set_option('compute.use_numexpr', False), my check log.debug("use numexpr? %d" % pd.get_option('compute.use_numexpr',)) confirms it is false yet pandas insists on using the numexpr engine. Do I use the API wrong or is it a bug ?

@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas and removed Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas labels Apr 26, 2020
@mroeschke
Copy link
Member

xref #29618

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas expressions pd.eval, query ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

6 participants