Skip to content

TST: Remove even more uses np.array_equal in tests #18087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 16, 2017

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Nov 3, 2017

Implement assert_not as a way to check that assertions should fail (for methods more sophisticated than a simple bare assert). Also takes the opportunity to remove several more np.array_equal assertions.

Follow-up to #18047

@gfyoung gfyoung added the Testing pandas testing functions or related to the test suite label Nov 3, 2017
@gfyoung gfyoung added this to the 0.21.1 milestone Nov 3, 2017
@codecov
Copy link

codecov bot commented Nov 3, 2017

Codecov Report

Merging #18087 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18087      +/-   ##
==========================================
- Coverage   91.27%   91.23%   -0.05%     
==========================================
  Files         163      163              
  Lines       50120    50120              
==========================================
- Hits        45749    45728      -21     
- Misses       4371     4392      +21
Flag Coverage Δ
#multiple 89.04% <ø> (-0.03%) ⬇️
#single 40.32% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/util/testing.py 100% <ø> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4375bd...ac69b46. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 3, 2017

Codecov Report

Merging #18087 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18087      +/-   ##
==========================================
- Coverage    91.4%   91.38%   -0.02%     
==========================================
  Files         164      164              
  Lines       49880    49880              
==========================================
- Hits        45592    45583       -9     
- Misses       4288     4297       +9
Flag Coverage Δ
#multiple 89.19% <ø> (ø) ⬆️
#single 39.42% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 148ed63...85ba491. Read the comment docs.

@@ -237,7 +237,7 @@ def test_modulo(self):
s = p[0]
res = s % p
res2 = p % s
assert not np.array_equal(res.fillna(0), res2.fillna(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so another way to do this and maybe cleaner is to have a parameter compare='equal|'not_equal'
The main issue I have with this method is the error messages are not as good (though maybe it doesn't matter)

Copy link
Member Author

@gfyoung gfyoung Nov 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, wouldn't I have to propagate that parameter across every single assert_* function? My change allow for any assert_* function to be passed in.

Also, I'm not sure I follow what error message you would expect. If the assert_* function passes when it isn't supposed to, you can't specify what was "different" in this case.

Copy link
Member Author

@gfyoung gfyoung Nov 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback : As I explained why I wasn't really a fan passing in a compare parameter to all of these assert_* functions, if you have any other thoughts on this point, let me know. Otherwise, I think this should be good to go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback : Any thoughts?

@jreback
Copy link
Contributor

jreback commented Nov 6, 2017

@gfyoung
Copy link
Member Author

gfyoung commented Nov 10, 2017

@jorisvandenbossche @TomAugspurger : Any thoughts?

@TomAugspurger
Copy link
Contributor

Yeah, this seems good.

@jorisvandenbossche
Copy link
Member

Is there a reason we can't use assert not s.equals(s2) ?

@gfyoung
Copy link
Member Author

gfyoung commented Nov 13, 2017

@jorisvandenbossche : Hmm...that's a good point. How comprehensive is it vs. tm.assert_series_equal ?

@gfyoung
Copy link
Member Author

gfyoung commented Nov 13, 2017

In any case, I'm still in favor of using the method I have developed in cases where our equality checking isn't as strict (e.g. check_dtype=False)

@jorisvandenbossche
Copy link
Member

equals only looks if the data are equal, not the attributes (eg not the Series name), so it less strict than assert_series_equal.
But I don't think that this is a problem in this case. As because if a different name is the reason you want assert_series_equal to fail, you shouldn't use this, but just explicitly assert that the names are not equal.

So personally I don't find that an argument to have assert_not(assert_series_equal(..)) (but there might be other arguments)

@gfyoung
Copy link
Member Author

gfyoung commented Nov 13, 2017

@jorisvandenbossche : Fair point. This function is meant to be catch-all for any of our in-house assert_* functions, and initially it was meant to address our not having a counter-part for assert_numpy_array_equal.

That being said, assert_not(assert_series_equal(...)) could indeed be simplified as you suggested, and I can make that change later today.

@gfyoung gfyoung force-pushed the not-numpy-array-equal branch from ac69b46 to f49afb9 Compare November 14, 2017 07:32
@jreback jreback modified the milestones: 0.21.1, 0.22.0 Nov 14, 2017
@jreback
Copy link
Contributor

jreback commented Nov 14, 2017

lgtm. @jorisvandenbossche ?

@jorisvandenbossche
Copy link
Member

I am still not sure adding an assert_not to our own testing addons is worth it for the current single use case (and even for that one, I personally find assert not np.array_equal(df.values, b_c) much more natural to read than tm.assert_not(tm.assert_numpy_array_equal, df.values, b_c))

@gfyoung
Copy link
Member Author

gfyoung commented Nov 14, 2017

@jorisvandenbossche : The whole point is not to use np.array_equal in the first place, as our testing methods with assert_numpy_array_equal are richer. This was a more generalized solution rather than writing a one-off solution of assert_not_numpy_array_equal.

In any case, it just so happened that Series equality checking was sufficient in many of the cases where we were using np.array_equal, but this construct that I created can be used for those cases (albeit not so frequent) where we might want to check that a certain assert should fail.

@gfyoung
Copy link
Member Author

gfyoung commented Nov 14, 2017

@jorisvandenbossche : If you haven't done so already, have a look at #18047, where this whole discussion started around array_equal.

@jorisvandenbossche
Copy link
Member

I didn't see that, but that issue does not seem to give a reasoning for the need to remove array_equal?
And if that is the goal, I would use np.allclose

@gfyoung
Copy link
Member Author

gfyoung commented Nov 14, 2017

I didn't see that, but that issue does not seem to give a reasoning for the need to remove array_equal?

I think the idea (and @jreback feel free to jump in) is that we were avoiding explicitly using the numpy API for our testing purposes and only using in-house comparisons during testing. That's the emphasis that I have seen (and received feedback on over the course of previous PR's).

As written, np.array_equal works to get the tests passing, but we would rather check with a different function than that.

@jorisvandenbossche
Copy link
Member

I am not sure if that is the general direction, eg with the move to pytest we actually moved a bit more away from our in-house testing functionality (and you did some of those PRs to replace in-house methods with standard pytest functionality).
So I don't see why we are adding here new in-house functionality to maintain while there is a perfectly sensible solution with standard tools of numpy and pytest (unless there is a clear benefit of an in-house method, which is sometimes the case, but I don't see that here)

@gfyoung
Copy link
Member Author

gfyoung commented Nov 14, 2017

I'm fine with incorporating pytest tools in-place of in-house. However, numpy tools in-place of in-house, not so much. More often than not, we can use something better than array_equal when testing, as you can see in this PR and the one preceding it.

So if you have a suggestion for writing this in pytest idiom, I'm all for it. However, if we can't, then I'd rather do this in-house than rely on numpy tools.

@jorisvandenbossche
Copy link
Member

More often than not, we can use something better than array_equal when testing, as you can see in this PR and the one preceding it.

Can you then be more specific? What is 'bad' about array_equal in this specific use case?

So if you have a suggestion for writing this in pytest idiom, I'm all for it. However, if we can't, then I'd rather do this in-house than rely on numpy tools.

I think assert not np.array_equal(..) or assert not np.allclose(..) is a perfect pytest idiom

@gfyoung
Copy link
Member Author

gfyoung commented Nov 14, 2017

@jorisvandenbossche : In this specific case, I'll give it to you that we actually don't need this paradigm at all, as the shape of the arrays are completely different (just ran through the test manually), making this array_equal check kind of ridiculous.

So in that case, no need to implement assert_not for this PR, and we can table that discussion for another time if need be.

That being said, array_equal, similar to Series.equal only checks values and shape, but not dtype, which is what assert_numpy_array_equal does.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 14, 2017

That being said, array_equal, similar to Series.equal only checks values and shape, but not dtype, which is what assert_numpy_array_equal does.

Yes, but as I said above, IMO if the reason that you want assert_series/numpy_array_equal to fail is not because the values differ but because the dtype (or name, ..) differs, I find assert_not(assert_...()) very obscure and I think we should explicitly test that attribute.

@jorisvandenbossche
Copy link
Member

So my reasoning: either the values differ -> simply use .equals / array_equal, or either the values are equal but the dtype, name, ... differ -> test that explicitly

@gfyoung gfyoung force-pushed the not-numpy-array-equal branch from f49afb9 to 85ba491 Compare November 15, 2017 05:18
@gfyoung gfyoung changed the title TST: Implement assert_not in testing.py TST: Remove even more uses np.array_equal in tests Nov 15, 2017
@gfyoung
Copy link
Member Author

gfyoung commented Nov 15, 2017

@jorisvandenbossche : I just renamed the PR to just remove instances of "array_equal" from tests. Everything is green, so PTAL.

@jreback jreback merged commit 2a6023e into pandas-dev:master Nov 16, 2017
@jreback
Copy link
Contributor

jreback commented Nov 16, 2017

thanks @gfyoung

@jreback
Copy link
Contributor

jreback commented Nov 16, 2017

just for edification. we don't like np.array_equal because it doesn't work with NaNs; this is why we have array_equivalent, which is you can use as well in tests.

@gfyoung gfyoung deleted the not-numpy-array-equal branch November 16, 2017 04:55
@gfyoung
Copy link
Member Author

gfyoung commented Nov 16, 2017

this is why we have array_equivalent

All this time, and I didn't know that existed. 😄 Should mention that from now on in case we have this situation down the road.

gfyoung added a commit to forking-repos/pandas that referenced this pull request Nov 16, 2017
@jorisvandenbossche
Copy link
Member

Thanks @gfyoung

we don't like np.array_equal because it doesn't work with NaNs; this is why we have array_equivalent, which is you can use as well in tests.

and on the numpy side, I think allclose is the 'replacement' for array_equal that also can handle NaNs

gfyoung added a commit to forking-repos/pandas that referenced this pull request Nov 16, 2017
gfyoung added a commit that referenced this pull request Nov 16, 2017
Follow-up to gh-18087.

The remaining test that uses the function call uses
functionality that doesn't exist anymore in pandas.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants