Skip to content

BUG/TST: run and fix all arithmetic tests with+without numexpr #40463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 26, 2021

Conversation

jorisvandenbossche
Copy link
Member

Closes #40462

This PR adds a auto-used fixture for pandas/tests/arithmetic that sets the expressions._MIN_ELEMENTS to 0, to force taking the numexpr path, even if our test data are small (which would otherwise never excercise the numexpr code path)

Further, it makes some fixes to get the tests passing with this better coverage.
The fixes are a bit quick-and-dirty, but at least already show what is needed to get things working.

@jorisvandenbossche jorisvandenbossche added Bug Testing pandas testing functions or related to the test suite Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 16, 2021
@@ -199,7 +207,9 @@ def arithmetic_op(left: ArrayLike, right: Any, op):
rvalues = ensure_wrapped_if_datetimelike(right)
rvalues = _maybe_upcast_for_op(rvalues, lvalues.shape)

if should_extension_dispatch(lvalues, rvalues) or isinstance(rvalues, Timedelta):
if should_extension_dispatch(lvalues, rvalues) or isinstance(
rvalues, (Timedelta, BaseOffset, Timestamp, NaTType)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this additional check could maybe be moved into should_extension_dispatch (although it is not necessarily related to "is extension array", but rather to "don't take the numexpr path")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, IIRC should_extension_dispatch is only used here, so might as well refactor/rename/move/whatever is most convenient.

There is a comment below about why Timedelta is included; can you update it for the others

i think check rvalues is NaT rather than isinstance check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to right is NaT and updated the comment.

@@ -157,8 +159,14 @@ def _na_arithmetic_op(left, right, op, is_cmp: bool = False):
"""
import pandas.core.computation.expressions as expressions

if isinstance(right, str):
# can never use numexpr
func = op
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically with a string argument, numexpr will fail with a "wrong" error message. Alternatively, _can_use_numexpr in expressions.py could also be updated to check for this and avoid using the numexpr path (currently that only checks object with dtypes, not for scalars)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make an effort to keep numexpr-specific lgoic in _can_use_numexpre/expressions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel would you be OK with leaving the check here as is, short term? I have a next PR that moves this check inside a can_use_numexpr function inside expressions.py (#41122), so that will clean this up.
But I would like to merge this PR before #41122 since this one is adding a lot of test coverage for with/without numexpr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ok for now, but let's for sure move later

_MIN_ELEMENTS = expr._MIN_ELEMENTS
expr._MIN_ELEMENTS = request.param
yield request.param
expr._MIN_ELEMENTS = _MIN_ELEMENTS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get rid of some of the setup/teardown in test_expressions with this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially something similar could be used there as well, yes. But this PR is focusing on the tests/arithmetic/ tests, there is #40497 as general issue to modernize test_expressions.py

@jreback
Copy link
Contributor

jreback commented Mar 20, 2021

can you merge master

@jreback jreback added this to the 1.3 milestone Apr 2, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you merge master again; @jbrockmendel comments addressed?

@jbrockmendel
Copy link
Member

@jbrockmendel comments addressed?

Not yet

@jorisvandenbossche
Copy link
Member Author

This is ready for another review now

@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented Apr 23, 2021

(although there are some failures about a wrong error message with the latest commit, I see, but aside from that should already be reviewable) this should be fixed, wrong bracket alignment ;)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rebase as well

@@ -157,8 +159,14 @@ def _na_arithmetic_op(left, right, op, is_cmp: bool = False):
"""
import pandas.core.computation.expressions as expressions

if isinstance(right, str):
# can never use numexpr
func = op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah ok for now, but let's for sure move later

@@ -246,7 +259,10 @@ def comparison_op(left: ArrayLike, right: Any, op) -> ArrayLike:
"Lengths must match to compare", lvalues.shape, rvalues.shape
)

if should_extension_dispatch(lvalues, rvalues):
if should_extension_dispatch(lvalues, rvalues) or (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same check as above, (L212) can you put the common parts in a function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly different: here I need an additional and not is_object_dtype(lvalues.dtype), because in that case we need to take the comp_method_OBJECT_ARRAY code path below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure i see that, that's why i said common parts (meaning the datetimes like + NaT)

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but note the brackets: currently the checks for the scalars is first combined with is_object_dtype, before being combined with should_extension_dispatch. So that means I cannot move those scalar checks inside should_extension_dispatch without changing the behaviour of the overall check.

@@ -405,6 +406,11 @@ def test_ser_div_ser(self, dtype1, any_real_dtype):
name=None,
)
expected.iloc[0:3] = np.inf
if first.dtype == "int64" and second.dtype == "float32":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the reverse excluded as well?

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reverse (float32 + int64) is not tested, as the first.dtype is always int64/float64/uint64

(but yeah, the reverse order would also result in float32 instead of float64 when numexpr is used)

@jreback jreback merged commit 0158382 into pandas-dev:master Apr 26, 2021
@jreback
Copy link
Contributor

jreback commented Apr 26, 2021

thanks @jorisvandenbossche

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: operations with Offset fail for large arrays when using numexpr
3 participants