Skip to content

PERF: no need to check for DataFrame in pandas.core.computation.expressions #40445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Mar 15, 2021

In practice, this case is only covered by specific tests using DataFrame in test_expressions.py, not from actual usage through ops (so I just updated the failing tests to not use any DataFrame as input to those checks).
Although our test suite might not be the best suited to check that, since many tests are small and thus don't use the numexpr path... (fixing that in #40463)

@jorisvandenbossche
Copy link
Member Author

cc @jreback you might be more familiar with the history behind this. I suppose at some point, DataFrames were passed directly to evaluate in expressions.py. But now we are only passing arrays from array_ops.py, so I just updated the tests.

@jreback jreback added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Mar 17, 2021
@jreback jreback added this to the 1.3 milestone Mar 17, 2021
_array2 = _frame2["A"].values.copy()

_array_mixed = _mixed["D"].values.copy()
_array_mixed2 = _mixed2["D"].values.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we really need to use fixtures for this module.....can you create an issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, opened #40497 for this

@@ -170,16 +173,9 @@ def testit():

op = getattr(operator, opname)

result = expr._can_use_numexpr(op, op_str, left, left, "evaluate")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this will just raise now? if so can you add a test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it seems we just return False. So I could leave that as a test, but note that this is a file-internal helper function (leading underscore) that's within pandas never called with a DataFrame. So I would basically be testing something that is not expected to have any specific behaviour.

@jorisvandenbossche
Copy link
Member Author

@jreback OK here?

@jreback jreback added the Performance Memory or execution speed performance label Mar 23, 2021
@jreback jreback merged commit f115360 into pandas-dev:master Mar 23, 2021
@jreback
Copy link
Contributor

jreback commented Mar 23, 2021

thanks @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche deleted the expressions-dtypes-check branch March 23, 2021 13:59
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants