Skip to content

CI: fix with new numpy nightly #50129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 10, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion pandas/tests/dtypes/test_missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@

from pandas._libs import missing as libmissing
from pandas._libs.tslibs import iNaT
from pandas.compat import is_numpy_dev

from pandas.core.dtypes.common import (
is_float,
Expand Down Expand Up @@ -460,7 +461,7 @@ def test_array_equivalent_series(val):
cm = (
# stacklevel is chosen to make sense when called from .equals
tm.assert_produces_warning(FutureWarning, match=msg, check_stacklevel=False)
if isinstance(val, str)
if isinstance(val, str) and not is_numpy_dev
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before we had:

In [1]: np.array([1, 3]) == 'abc'
<ipython-input-1-4b7b4f567841>:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  np.array([1, 3]) == 'abc'
Out[1]: False

With the new nightly:

In [2]: np.array([1, 3]) == 'abc'
Out[2]: array([False, False])

else nullcontext()
)
with cm:
Expand Down
11 changes: 9 additions & 2 deletions pandas/tests/frame/methods/test_compare.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import numpy as np
import pytest

from pandas.compat import is_numpy_dev

import pandas as pd
import pandas._testing as tm

Expand Down Expand Up @@ -257,8 +259,13 @@ def test_compare_ea_and_np_dtype(val1, val2):
("b", "other"): np.nan,
}
)
result = df1.compare(df2, keep_shape=True)
tm.assert_frame_equal(result, expected)
if val1 is pd.NA and is_numpy_dev:
# can't compare with numpy array if it contains pd.NA
with pytest.raises(TypeError, match="boolean value of NA is ambiguous"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm why wouldn't we allow this? The returned array should just be object dtype with pd.NA no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comparing with an object ndarray which contains pd.NA now just raises #50124 (comment)

Brock had commented here about this here, saying it'd be his inclination to say that raising is expected #50129 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could putting NA in numpy arrays be a sign of a bug/design flaw in the DataFrame.compare method? Maybe there should be a special path when dealing with masked arrays to take into account that bool(NA) will fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, the column is being designed to be of dtype object with pd.NA in it:

arr = [4.0, val1]  # val1 is pd.NA from the fixture
df1 = pd.DataFrame({"a": arr, "b": [1.0, 2]})

So I think DataFrame.compare is doing the right thing - df1['a'] is an ordinary object array, not a masked one

Arguably it's the test that's doing something unsupported and should be rewritten so that df['a'] is a nullable type (which can handle pd.NA just fine)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our extension arrays are 1D, as soon as you have to operate on more than one column, we coerce to object dtype. Not saying this is the case here, but this is inevitable in some places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to this: the DataFrame in the test has dtype object, eg no masked array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, though fortunately the error's not come up in other places (yet! 😨 ). For EAs, if an error comes up as a result of this, then perhaps it's up to pandas to find a workaround

But for object arrays (non-EA) which contain pd.NA, do you agree that this is unsupported?

Copy link
Member

@phofl phofl Dec 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly responding to the design flaw bit. I totally agree that users shouldn’t do this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our extension arrays are 1D, as soon as you have to operate on more than one column, we coerce to object dtype. Not saying this is the case here, but this is inevitable in some places.

This is a problem when processing masked arrays in Dataframe methods/functions. Maybe there could be made an internal dataframe-wide na mask method, so the we can deal with NA's properly on a dataframe level, for example in the compare method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for object arrays (non-EA) which contain pd.NA, do you agree that this is unsupported?

Yes, it's inherently usupported, as bool(NA) fails (and should fail). Seeing pd.NA in numpy array may be code smells that we should avoid and work with the masks instead.

result = df1.compare(df2, keep_shape=True)
else:
result = df1.compare(df2, keep_shape=True)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
Expand Down
2 changes: 2 additions & 0 deletions pandas/tests/indexes/object/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import pytest

from pandas._libs.missing import is_matching_na
from pandas.compat import is_numpy_dev

import pandas as pd
from pandas import Index
Expand Down Expand Up @@ -92,6 +93,7 @@ def test_get_indexer_non_unique_nas(self, nulls_fixture):
tm.assert_numpy_array_equal(indexer, expected_indexer)
tm.assert_numpy_array_equal(missing, expected_missing)

@pytest.mark.skipif(is_numpy_dev, reason="GH50124")
@pytest.mark.filterwarnings("ignore:elementwise comp:DeprecationWarning")
def test_get_indexer_non_unique_np_nats(self, np_nat_fixture, np_nat_fixture2):
expected_missing = np.array([], dtype=np.intp)
Expand Down
3 changes: 2 additions & 1 deletion pandas/tests/series/methods/test_equals.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import pytest

from pandas._libs.missing import is_matching_na
from pandas.compat import is_numpy_dev

from pandas.core.dtypes.common import is_float

Expand Down Expand Up @@ -50,7 +51,7 @@ def test_equals_list_array(val):

cm = (
tm.assert_produces_warning(FutureWarning, check_stacklevel=False)
if isinstance(val, str)
if isinstance(val, str) and not is_numpy_dev
Comment on lines -53 to +54
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else nullcontext()
)
with cm:
Expand Down