Skip to content

ENH: remove restrictions to numexpr to allow where etc. #34834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jonas-eschle opened this issue Jun 16, 2020 · 12 comments
Open

ENH: remove restrictions to numexpr to allow where etc. #34834

jonas-eschle opened this issue Jun 16, 2020 · 12 comments
Labels
Enhancement expressions pd.eval, query Needs Discussion Requires discussion from core team before further action

Comments

@jonas-eschle
Copy link

jonas-eschle commented Jun 16, 2020

Is your feature request related to a problem?

the evaluation of a query is currently limited to the list _mathops while numexpr would support more, most notably a where (that would also solve other issues simple).

I do not see any reason for this restriction. In fact, simply adding the where runs (at least for my use case). Why is this restriction in place? Why can't we enlarge it/directly pass it through to numexpr?

Describe the solution you'd like

Allow the full operator set that numexpr supports in the pd.eval

API breaking implications

Nothing

Alternatives

Using .where is an option if you can access the dataframe directly (although suboptimal). However, if your selection of data is based on passing a selection string around instead of the df (several reasons for this), the latter is not feasible.

The following doesn't work:

import pandas as pd

data = {'a': [1, 2, 3]}                                                 

df = pd.DataFrame({'a': [1, 2, 3]})
df.eval('where(a>2, 42, 0)')

whereas in numexpr it does

numexpr.evaluate('where(a>2, 42, 0)', local_dict=data)

we expect this to return [0, 0, 42]

@jonas-eschle jonas-eschle added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 16, 2020
@TomAugspurger
Copy link
Contributor

Can you add a minimal example? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Jun 17, 2020
@jonas-eschle
Copy link
Author

Well, yeah. I've updated it. But it does not make so much sense I think to provide an example for eval/query? Or what exactly do you think is unclear?

Again, in short: eval/query use numexpr to forward a string for evaluation but (to my view) unnecessarily limit the allowed expressions to a subset of what numexpr supports.

Let me know if things are confusing and thanks a lot for taking a look at it!

@TomAugspurger
Copy link
Contributor

Thanks. And can you add the expected output?

Do you know if we have other expressions that would be supported by only one engine (numexpr in this case)?

@TomAugspurger TomAugspurger added expressions pd.eval, query Needs Discussion Requires discussion from core team before further action and removed Needs Info Clarification about behavior needed to assess issue Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 17, 2020
@Liam3851
Copy link
Contributor

Liam3851 commented Mar 20, 2021

I was about to report this issue but realized [edit: I would have created] a dupe. numexpr supports the following functions missing from the definitions in pandas.core.computation.ops:

  • where
  • tan (oddly, we support sin, cos, tanh, arctan, and arctan2, but not tan)
  • conj, real, imag, complex (though I'm not sure the extent to which pandas supports complex numbers in general so this might be fine)
  • contains (for strings)

Expectation is that

import pandas as pd

data = {'a': [1, 2, 3]}                                                 

df = pd.DataFrame({'a': [1, 2, 3]})
df.eval('where(a>2, 42, 0)')

should return the same as

pd.DataFrame({'a':[0, 0, 42]})

@Liam3851
Copy link
Contributor

My apologies @mayou36 can you reopen? I meant that I would have created a dupe. This is not a dupe, I think it's a legit issue that pandas doesn't support where or tan when numexpr does.

@jonas-eschle jonas-eschle reopened this Mar 21, 2021
@jonas-eschle
Copy link
Author

Oc, sorry that was by mistake and it is not yet resolved.

@TomAugspurger, do you have an idea why this is not here?

@achimgaedke
Copy link

achimgaedke commented Jun 20, 2022

I have "solved" my problem by using

import pandas
pandas.core.computation.ops.MATHOPS = (*pandas.core.computation.ops.MATHOPS, "where")

Works out of the box with ternary operators, 🎉

df = pandas.DataFrame({"a": [2.0, 4.0, 5.0]})
pandas.eval("where(df.a > 3.0, df.a, 1)", target=df)

results in

0    1.0
1    4.0
2    5.0
Name: done, dtype: float64

Not exactly proud of this solution, but this shows that this feature request is probably done by adding the strings and writing some unit tests.

NB: arguments 2 and 3 have to be numbers - I'd love to have strings (same type should only be required only for arg 2 and 3)
NB2: (df.a>3.0) * df.a + (df.a<=3) * 1 works like where(df.a > 3.0, df.a, 1)

@jonas-eschle
Copy link
Author

@TomAugspurger any news on this? This seems to be a limitation for no apparent reason?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 6, 2024 via email

@jonas-eschle
Copy link
Author

Okay, so what about if I go ahead and remove the limitation in a PR? Maybe the tests will tell us why this is there, or some maintainer, but without comments in the code or on the issue here, we can only guess.

@Aloqeely
Copy link
Member

Another issue was opened for this same topic: #55091

@jonas-eschle I suggest you add whichever functions you think are going to be useful and write unit tests that ensures every function you're adding is working properly.
A PR was submitted recently to add tan (#58334), so probably wait until it gets reviewed/merged

@domsmrz
Copy link
Contributor

domsmrz commented Apr 20, 2024

FYI also loosely related I've filed #58329 which means that df.eval('where(...)') returns np.array for engine="python" and pd.Series for engine="numexpr" (if one adds "where" to the list of allowed functions). Probably not necessarily a blocker, but something to keep in mind (at least while writing tests).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement expressions pd.eval, query Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

6 participants