Skip to content

TST: insert 'match' to bare pytest raises in pandas/tests/internals/ #30998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 20, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 41 additions & 9 deletions pandas/tests/internals/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -297,7 +297,10 @@ def test_delete(self):
assert (newb.values[1] == 1).all()

newb = self.fblock.copy()
with pytest.raises(Exception):

msg = "index 3 is out of bounds for axis 0 with size 3"

with pytest.raises(Exception, match=msg):
newb.delete(3)


Expand All @@ -321,7 +324,12 @@ def test_can_hold_element(self):

val = date(2010, 10, 10)
assert not block._can_hold_element(val)
with pytest.raises(TypeError):

msg = (
"'value' should be a 'Timestamp', 'NaT', "
"or array of those. Got 'date' instead."
)
with pytest.raises(TypeError, match=msg):
arr[0] = val


Expand Down Expand Up @@ -350,7 +358,10 @@ def test_duplicate_ref_loc_failure(self):
blocks[1].mgr_locs = np.array([0])

# test trying to create block manager with overlapping ref locs
with pytest.raises(AssertionError):

msg = "Gaps in blk ref_locs"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should test this, this is an internal error message

(and it's also not a very good error message)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and it's also not a very good error message)

Isn't that exactly the reason why we should test them in the first place? Puts more light on the quality of the error messages that we raise, internal or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that exactly the reason why we should test them in the first place

If we, in one go, improve the error message. Yes, that is fine, but otherwise I don't think it is worth the effort.

Copy link
Member

@gfyoung gfyoung Jan 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is fine, but otherwise I don't think it is worth the effort.

Debatable. If you're someone who has just started working on this codebase, I wouldn't want them to also modify the error message if they don't fully understand what is going. You can easily spend another PR figuring out what the proper error message is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jorisvandenbossche here I don't think worth testing against error messages for AssertionError, since that is purely internal

Copy link
Member

@gfyoung gfyoung Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since that is purely internal

I still don't find that to be valid justification. Internal or not, we use error messages for a reason: because they provide some utility in development. If these error messages (not the Exception class itself) had no use and won't see the light of day simply because they're internal, we should just get rid of them. To borrow from @jorisvandenbossche, they aren't worth the effort then.

Error messages can also serve to differentiate between which error is raised. The managers.py file is littered with AssertionError lines. It would be good to know that the correct one is being raised.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually ok with testing these asserts; they are the current behavior. If things change then this test should fail. That said I wouldn; spend too much effort on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because they provide some utility in development.

Yes, they are useful for that. But also because they are only used in development, they can also easily change when refactoring some internals (without that there are user-facing consequences).

But every test (and certainly a test with a match) takes also effort to write, review and maintain (eg update if you update something internally). It's a trade-off. So I am not saying those internal error messages are not useful at all, but I still think they are not worth the effort to properly test with full coverage.


with pytest.raises(AssertionError, match=msg):
BlockManager(blocks, axes)

blocks[0].mgr_locs = np.array([0])
Expand Down Expand Up @@ -807,8 +818,15 @@ def test_validate_bool_args(self):
invalid_values = [1, "True", [1, 2, 3], 5.0]
bm1 = create_mgr("a,b,c: i8-1; d,e,f: i8-2")

msg = (
r'For argument "inplace" expected type bool, received type int.|'
r'For argument "inplace" expected type bool, received type str.|'
r'For argument "inplace" expected type bool, received type list.|'
r'For argument "inplace" expected type bool, received type float.'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave out the last word to have this less repetitive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a second look, I think I will just have msg to be equal to:

msg = "For argument "inplace" expected type bool, received type (int|str|list|float)."

Will this be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should work. Give it a shot.

)

for value in invalid_values:
with pytest.raises(ValueError):
with pytest.raises(ValueError, match=msg):
bm1.replace_list([1], [2], inplace=value)


Expand Down Expand Up @@ -1027,9 +1045,11 @@ def test_slice_len(self):
assert len(BlockPlacement(slice(1, 0, -1))) == 1

def test_zero_step_raises(self):
with pytest.raises(ValueError):
msg = "slice step cannot be zero"

with pytest.raises(ValueError, match=msg):
BlockPlacement(slice(1, 1, 0))
with pytest.raises(ValueError):
with pytest.raises(ValueError, match=msg):
BlockPlacement(slice(1, 2, 0))

def test_unbounded_slice_raises(self):
Expand Down Expand Up @@ -1132,9 +1152,11 @@ def assert_add_equals(val, inc, result):
assert_add_equals(slice(1, 4), -1, [0, 1, 2])
assert_add_equals([1, 2, 4], -1, [0, 1, 3])

with pytest.raises(ValueError):
msg = "iadd causes length change"

with pytest.raises(ValueError, match=msg):
BlockPlacement(slice(1, 4)).add(-10)
with pytest.raises(ValueError):
with pytest.raises(ValueError, match=msg):
BlockPlacement([1, 2, 4]).add(-10)


Expand Down Expand Up @@ -1216,7 +1238,17 @@ def test_binop_other(self, op, value, dtype):
}

if (op, dtype) in invalid:
with pytest.raises(TypeError):
msg = (
"cannot perform __pow__ with this index type: DatetimeArray|"
"cannot perform __mod__ with this index type: DatetimeArray|"
"cannot perform __truediv__ with this index type: DatetimeArray|"
"cannot perform __mul__ with this index type: DatetimeArray|"
"cannot perform __pow__ with this index type: TimedeltaArray|"
"ufunc 'multiply' cannot use operands with types dtype"
r"\('<m8\[ns\]'\) and dtype\('<m8\[ns\]'\)|"
"cannot add DatetimeArray and Timestamp"
)
with pytest.raises(TypeError, match=msg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in light of the discussion in #30999. Personally, I find that the above code change makes the test much more complicated and more difficult to read and interpret.

There is always a trade-off between testing every exact details vs practicality/readabiity/effort to do this, and personally for a case like the above, this is not worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks hard to read because it's so monolithic.

This test could benefit from a TON of refactoring, including pytest parameterization.

Copy link
Member Author

@ShaharNaveh ShaharNaveh Jan 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test could benefit from a TON of refactoring, including pytest parameterization.

@gfyoung I am looking into refactoring this test from the ground up, what would you put as the pytest parameterization(s) ?

I hope I can build the test from the ground up, depends on the given parameterization(s).

Copy link
Member

@gfyoung gfyoung Jan 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking into refactoring this test from the ground up, what would you put as the pytest parameterization(s) ?

I might have jumped the gun there in saying we should do pytest parameterization. If there is a way to parameterize so we can remove the logic for the skip tests (and put it as part of the decorator), that would be good.

A similar point could be made about the combinations we expect to fail because I'm uncertain as to feasability. If we could somehow parameterize properly, we could easily break that part out as a separate test.

op(s, e.value)
else:
# FIXME: Since dispatching to Series, this test no longer
Expand Down