Skip to content

TST: GH30999 Add match=msg to all pytest.raises in tests/indexes #38697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
19 changes: 7 additions & 12 deletions pandas/tests/indexes/interval/test_astype.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import re

import numpy as np
import pytest

Expand Down Expand Up @@ -153,22 +155,15 @@ def test_subtype_integer(self, subtype):
with pytest.raises(ValueError, match=msg):
index.insert(0, np.nan).astype(dtype)

@pytest.mark.xfail(reason="GH#15832")
def test_subtype_integer_errors(self):
# float64 -> uint64 fails with negative values
index = interval_range(-10.0, 10.0)
dtype = IntervalDtype("uint64")
with pytest.raises(ValueError):
index.astype(dtype)

# float64 -> integer-like fails with non-integer valued floats
index = interval_range(0.0, 10.0, freq=0.25)
dtype = IntervalDtype("int64")
with pytest.raises(ValueError):
index.astype(dtype)

dtype = IntervalDtype("uint64")
with pytest.raises(ValueError):
msg = re.escape(
"Cannot convert interval[float64] to interval[uint64]; subtypes are "
"incompatible"
)
with pytest.raises(TypeError, match=msg):
index.astype(dtype)
Copy link
Member Author

@moink moink Dec 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test that was marked as failing, with the reason being closed issue #15832 . I put a bit of discussion about this over in the issue.

I removed the xfail and I also removed the last two pytest.raises. The result of the coercion is: IntervalIndex([(0, 0], (0, 0], (0, 0], (0, 1], (1, 1] ... (8, 9], (9, 9], (9, 9], (9, 9], (9, 10]], closed='right', dtype='interval[int64]') which doesn't look particularly useful but isn't exactly wrong either. If I understand the discussion at #15832 correctly this is intended behaviour for it not to raise the error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the test for

index = interval_range(0.0, 10.0, freq=0.25)
dtype = IntervalDtype("int64")

is lost here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I removed it deliberately, as well as the equivalent one for uint64. I tried to explain that in my comment above but I guess I wasn't clear.

This test was marked as failing before. Now it is not. The test, as written before, was saying that

index = interval_range(0.0, 10.0, freq=0.25)
dtype = IntervalDtype("int64")

should raise a ValueError. It does not raise any error. The question here is: Should casting an interval range with non-integer float boundaries raise an error? This test said it should, but it was marked as failing, because right now it does not raise an error. If I understood the discussion at #15832 correctly, and it is possible that I did not, than I think that it should not raise an error, so this test is not required.

The question is, as a user of pandas, if you had an interval range such as [(0.0, 0.25], (0.25, 0.5], (0.5, 0.75], (0.75, 1.0], (1.0, 1.25], (1.25, 1.5], (1.5, 1.75], (1.75, 2.0], (2.0, 2.25], (2.25, 2.5]], and you cast it to int, what would you want or expect? The old (failing) version of the test said you expect a ValueError. I am saying you expect [(0, 0], (0, 0], (0, 0], (0, 1], (1, 1], (1, 1], (1, 1], (1, 2], (2, 2], (2, 2]] which, as I said, looks not that useful as an IntervalIndex, but is not wrong either.

Would love to hear your thoughts.

I could also add another test for the expected behaviour of casting from float to int intervals if that's what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining, sorry should've read your comment on that linked issue. Your logic makes sense to me, but I'd defer to someone with more experience on how we handle IntervalIndex.

If being removed from this test, we definitely want to make sure this specific casting behavior is being tested somewhere. To keep this pr simple and easier to merge, I'd suggest reverting modification of this test, then making a followup pr to clean up this test and add tests for any untested casting behavior.

Copy link
Member Author

@moink moink Dec 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but I am not quite sure how to handle it. I want to leave the code in a good state even if I plan to do another PR soon, because it's possible that one will take a while or not be accepted ever or whatever. The way I see it, these are our options:

  1. Revert it to exactly what it was before I modified it. That would make the file fail the new linting rule we want to add as part of [Good first issue] TST: Disallow bare pytest.raises #30999 , so don't add the linting rule and leave that issue open. I'm not particularly happy with this option because I have gotten the number of bare pytest.raises down from 250ish to 64 in just a few days and I was hoping to finish in a few more days. And the longer we leave the rule out of the pre-commit script the more likely people are to add new bare pytest.raises and the issue will stay open forever.
  2. Revert it, and simultaneously add to the linting script a check that the bare pytest.raises is not in a test decorated with pytest xfail. Generally, addressing [Good first issue] TST: Disallow bare pytest.raises #30999 has been a pretty mechanical process, except when it comes to tests marked as failures. Then it takes a lot of digging through the code and historical issues to understand what would be a the correct failure message to add. It's the one thing that maybe makes [Good first issue] TST: Disallow bare pytest.raises #30999 not as good as a "good first issue" as it appears. I expect to run into more similar cases as I try to finish all the modules, because there are 26 files that only have 1 or 2 bare pytest raises and I suspect it's because they are the hard ones that people didn't know what message to put in. Adding the check to the script seems like kinda a cool project to figure out the ast module better than I currently do, but probably just that and not really worth our time. Unless we really think that we want the linter to do that.
  3. Come up with, out of whole cloth, the error message that I think should be included in such a ValueError, if in the future pandas were to raise one. This is the strategy I took in previous pull requests. (see TST: GH30999 Add match=msg to all "with pytest.raises" in pandas/tests/io/pytables/test_store.py #38609 for example, where I added a new completely invented error message to test_multiple_open_close). This is also what I was trying to do that led me to read the code and the past issues and decide that I didn't think it deserved an error message at all and therefore that the test didn't make sense and I would remove it.
  4. Put in an obviously completely bogus error message, so that future developers know that it's not a real one, and whoever in the future decides to address that test either by making it pass or by removing it can write a real message in the future.

Anyway, after thinking it through and writing it out, I think I will try for strategy 3, and you can decide if you like whatever message I come up with, but let me know if there's a different strategy that you prefer.

Apologies for the wall of text; I am new to contributing to pandas and obviously spent a lot of time thinking about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok done

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #38718

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New PR #38719

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's been done in past PR's, probably fine here too. Appreciate the thoughtfulness on your approach here!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah generally addressing on a new PR is the correct way, will look at that soon.


@pytest.mark.parametrize("subtype", ["datetime64[ns]", "timedelta64[ns]"])
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/indexes/multi/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,7 @@ def test_get_loc_duplicates(self):
xp = 0
assert rs == xp

with pytest.raises(KeyError):
with pytest.raises(KeyError, match="2"):
index.get_loc(2)

def test_get_loc_level(self):
Expand Down
6 changes: 2 additions & 4 deletions pandas/tests/indexes/multi/test_setops.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,14 +221,12 @@ def test_difference_sort_incomparable():
tm.assert_index_equal(result, idx)


@pytest.mark.xfail(reason="Not implemented.")
def test_difference_sort_incomparable_true():
# TODO decide on True behaviour
# # sort=True, raises
idx = pd.MultiIndex.from_product([[1, pd.Timestamp("2000"), 2], ["a", "b"]])
other = pd.MultiIndex.from_product([[3, pd.Timestamp("2000"), 4], ["c", "d"]])

with pytest.raises(TypeError):
msg = "The 'sort' keyword only takes the values of None or False; True was passed."
with pytest.raises(ValueError, match=msg):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was marked as failing. But when I switched it to checking that a ValueError was raised, it passed. I think this works as intended now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine, a lot of things have been cleaned up recently I guess this was missed. cc @jbrockmendel

idx.difference(other, sort=True)


Expand Down
11 changes: 9 additions & 2 deletions pandas/tests/indexes/period/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -487,8 +487,15 @@ def test_get_indexer_mismatched_dtype_with_method(self, non_comparable_idx, meth
other2 = other.astype(dtype)
if dtype == "object" and isinstance(other, PeriodIndex):
continue
# For object dtype we are liable to get a different exception message
with pytest.raises(TypeError):
# Two different error message patterns depending on dtypes
msg = "|".join(
re.escape(msg)
for msg in (
f"Cannot compare dtypes {pi.dtype} and {other.dtype}",
" not supported between instances of ",
)
)
with pytest.raises(TypeError, match=msg):
pi.get_indexer(other2, method=method)

def test_get_indexer_non_unique(self):
Expand Down
15 changes: 12 additions & 3 deletions pandas/tests/indexes/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,11 @@ def test_droplevel(self, index):
if isinstance(index.name, tuple) and level is index.name:
# GH 21121 : droplevel with tuple name
continue
with pytest.raises(ValueError):
msg = (
"Cannot remove 1 levels from an index with 1 levels: at least one "
"level must be left."
)
with pytest.raises(ValueError, match=msg):
index.droplevel(level)

for level in "wrong", ["wrong"]:
Expand Down Expand Up @@ -77,7 +81,11 @@ def test_constructor_unwraps_index(self, index):
# FutureWarning from non-tuple sequence of nd indexing
@pytest.mark.filterwarnings("ignore::FutureWarning")
def test_getitem_error(self, index, itm):
with pytest.raises(IndexError):
msg = r"index 101 is out of bounds for axis 0 with size [\d]+|" + re.escape(
"only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) "
"and integer or boolean arrays are valid indices"
)
with pytest.raises(IndexError, match=msg):
index[itm]

def test_to_flat_index(self, index):
Expand Down Expand Up @@ -249,7 +257,8 @@ def test_searchsorted_monotonic(self, index):
assert expected_right == ssm_right
else:
# non-monotonic should raise.
with pytest.raises(ValueError):
msg = "index must be monotonic increasing or decreasing"
with pytest.raises(ValueError, match=msg):
index._searchsorted_monotonic(value, side="left")

def test_pickle(self, index):
Expand Down
18 changes: 10 additions & 8 deletions pandas/tests/indexes/test_numpy_compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,7 @@ def test_numpy_ufuncs_basic(index, func):
# https://numpy.org/doc/stable/reference/ufuncs.html

if isinstance(index, DatetimeIndexOpsMixin):
# raise TypeError or ValueError (PeriodIndex)
with pytest.raises(Exception):
with tm.external_error_raised((TypeError, AttributeError)):
with np.errstate(all="ignore"):
func(index)
elif isinstance(index, (Float64Index, Int64Index, UInt64Index)):
Expand All @@ -66,7 +65,7 @@ def test_numpy_ufuncs_basic(index, func):
if len(index) == 0:
pass
else:
with pytest.raises(Exception):
with tm.external_error_raised((TypeError, AttributeError)):
with np.errstate(all="ignore"):
func(index)

Expand All @@ -77,6 +76,11 @@ def test_numpy_ufuncs_basic(index, func):
def test_numpy_ufuncs_other(index, func, request):
# test ufuncs of numpy, see:
# https://numpy.org/doc/stable/reference/ufuncs.html
msg = (
f"ufunc '{func.__name__}' not supported for the input types, and the "
"inputs could not be safely coerced to any supported types according "
"to the casting rule ''safe''"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one's also an external error from numpy. Even though this works, probably better to be consistent and use tm.external_error_raised.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if isinstance(index, (DatetimeIndex, TimedeltaIndex)):
if isinstance(index, DatetimeIndex) and index.tz is not None:
Expand All @@ -96,13 +100,11 @@ def test_numpy_ufuncs_other(index, func, request):
result = func(index)
assert isinstance(result, np.ndarray)
else:
# raise TypeError or ValueError (PeriodIndex)
with pytest.raises(Exception):
with pytest.raises(TypeError, match=msg):
func(index)

elif isinstance(index, PeriodIndex):
# raise TypeError or ValueError (PeriodIndex)
with pytest.raises(Exception):
with pytest.raises(TypeError, match=msg):
func(index)

elif isinstance(index, (Float64Index, Int64Index, UInt64Index)):
Expand All @@ -114,5 +116,5 @@ def test_numpy_ufuncs_other(index, func, request):
if len(index) == 0:
pass
else:
with pytest.raises(Exception):
with pytest.raises(TypeError, match=msg):
func(index)