Skip to content

BUG: query uses numexpr even when 'compute.use_numexpr' is False #33006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

quangngd
Copy link
Contributor

@quangngd quangngd commented Mar 25, 2020

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how this address the actual issue (which is that numexpr knows nothing about extension types; they shouldn't be passed to it at all).

but the title suggests you think there is an actual but in use_numexpr? can you explain.

@quangngd
Copy link
Contributor Author

If i understand correctly, the issue #32556 is about df.query does not care about the pd.set_option option, not about the extension dtype (which is still open at #25369)
@teto used pd.set_option('compute.use_numexpr', False) as a workaround for the original dtype issue but turned out it is also a bug on its own.

@teto
Copy link

teto commented Mar 26, 2020

@quangngd that's an accurate description of the issue. Thanks for taking care of this !

@jreback
Copy link
Contributor

jreback commented Mar 29, 2020

@quangngd you need to have a test which fails on master but works here. can you construct one.

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Mar 29, 2020
@teto
Copy link

teto commented Mar 29, 2020

there is one in my issue (might not be usable out of the box though).

@quangngd
Copy link
Contributor Author

quangngd commented Mar 30, 2020

@teto I don't think it's a good idea using a bug as a test as it would be fixed in the future.

@quangngd
Copy link
Contributor Author

So to my understanding, the test must be able to differentiate the used engine during the df.query/ df.eval call (yah, just found out the bug is in df.eval too, and is too fixed by this pr). The thing is we could't use any "non-parse-able" behaviour since it could be a bug and could be fixed in the future.
The best I could come up with is the following based on numexpr' note.

import pandas as pd
df = pd.DataFrame(data=[4+1j,5,6], columns=["col_test"])
s = "abs(col_test)"
display(df.eval(s, engine='python').dtype)
display(df.eval(s, engine='numexpr').dtype)

output:

dtype('float64')
dtype('complex128')

But so far I can't think of how to make a test for df.query

@jreback
Copy link
Contributor

jreback commented Jun 14, 2020

@quangngd I am not sure of the status here. please merge master and address comments.

@quangngd
Copy link
Contributor Author

@jreback the fix is already there but I cannot think of a sensible test

@rhshadrach
Copy link
Member

@quangngd: What about this?

from pandas.core.computation.eval import _check_engine
with pd.option_context('compute.use_numexpr', False):
    result = _check_engine(None)
    assert result == 'python'

@quangngd
Copy link
Contributor Author

quangngd commented Jul 7, 2020

@rhshadrach thankyou, the test indeed fails on master and pass on my branch. Where should be the proper place to place the test?

@rhshadrach
Copy link
Member

Where should be the proper place to place the test?

tests.computation.test_eval

@jbrockmendel jbrockmendel added the expressions pd.eval, query label Sep 14, 2020

if engine is None:
engine = "numexpr" if _NUMEXPR_INSTALLED else "python"
engine = "numexpr" if _USE_NUMEXPR else "python"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think in this case we need to check expr.USE_NUMEXPR, not USE_NUMEXPR since the former may change at runtime

@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

closing, this needs a reproducible tests that fails before any patch. (it can certainly be part of the patch, but demonstrate on the issue)

@jreback jreback closed this Nov 26, 2020
@teto
Copy link

teto commented Nov 27, 2020

wait what ? it's not easy to reproduce, you would need to disable one package or monkeypatch one package I suppose. The absence of a test should not prevent fixing an issue/document it/remove the feature.

@jreback
Copy link
Contributor

jreback commented Nov 27, 2020

w/o a test how do you know it was fixed?

@rhshadrach
Copy link
Member

Is this test not sufficient?

#33006 (comment)

@jreback
Copy link
Contributor

jreback commented Nov 27, 2020

sure if it fails on master

@saehuihwang saehuihwang removed their assignment Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions expressions pd.eval, query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataframe.query uses numexpr even when 'compute.use_numexpr' is False
6 participants