-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST/CLN: Fixturize frame/test_analytics #22733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST/CLN: Fixturize frame/test_analytics #22733
Conversation
Hello @h-vetinari! Thanks for submitting the PR.
|
expected = self.frame.corr() | ||
expected.loc['A', 'B'] = expected.loc['B', 'A'] = nan | ||
tm.assert_frame_equal(result, expected) | ||
def _check_method(self, frame, method='pearson'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a typo in the else-branch of check_minp
, and all tests still ran through, so I'm suggesting to remove this unused branch.
assert lcd_dtype == result0.dtype | ||
assert lcd_dtype == result1.dtype | ||
|
||
# result = f(axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncommenting these lines leads to some failures in test_sum
, test_median
, and test_sem
, but they seem to be tested more thoroughly directly above (with the correct wrappers and kwargs), so I think they should be removed.
pandas/tests/frame/test_analytics.py
Outdated
# bad axis | ||
tm.assert_raises_regex(ValueError, 'No axis named 2', f, axis=2) | ||
# make sure works on mixed-type frame | ||
getattr(self.mixed_frame, name)(axis=0) | ||
getattr(self.mixed_frame, name)(axis=1) | ||
getattr(float_string_frame, name)(axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From here on I'm thinking these tests should live in a separate class method, or be broken out some other way.
pandas/tests/frame/test_analytics.py
Outdated
self._check_stat_op('max', np.max, float_frame_with_na, | ||
float_frame, float_string_frame, | ||
check_dates=True) | ||
self._check_stat_op('max', np.max, int_frame, float_frame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In tests like this one, all the parts of _check_stat_op
that depend on float_frame
and float_string_frame
get tested twice unnecessarily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, can you have a look at the git history and see if you can figure out why. otherwise ok to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments to #22730 - if you can keep PR focused on fixtures and keep changes to name spacing / comments for another PR would be very helpful for review process
@WillAyd Reverted all the offending orthogonal changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking much better - thanks!
pandas/tests/frame/test_analytics.py
Outdated
def _check_method(self, frame, method='pearson'): | ||
correls = frame.corr(method=method) | ||
exp = frame['A'].corr(frame['C'], method=method) | ||
tm.assert_almost_equal(correls['A']['C'], exp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can use result
and expected
in spite of the existing code using exp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, first you ask me to remove all orthogonal changes, and then you ask me to add... orthogonal changes? Same happened with the import clean-up other PRs, but when I preempted them here, you ask me to remove them - can you understand how exasperating this back-and-forth is?
pandas/tests/frame/test_analytics.py
Outdated
@@ -2078,9 +2079,6 @@ def test_n_error(self, df_main_dtypes, nselect_method, columns): | |||
col = columns[1] | |||
error_msg = self.dtype_error_msg_template.format( | |||
column=col, method=nselect_method, dtype=df[col].dtype) | |||
# escape some characters that may be in the repr | |||
error_msg = (error_msg.replace('(', '\\(').replace(")", "\\)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it as part of trying to hunt down some DeprecationWarnings for unescaped brackets (which I believe are somehow coming from pytest), and the tests still ran through. It is IMO an unnecessary misdirection that makes the test harder to read, but I reverted it.
pandas/tests/frame/test_analytics.py
Outdated
@@ -784,17 +799,13 @@ def alt(x): | |||
assert kurt.name is None | |||
assert kurt2.name == 'bar' | |||
|
|||
def _check_stat_op(self, name, alternative, frame=None, has_skipna=True, | |||
def _check_stat_op(self, name, alternative, main_frame, float_frame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any risk here in using arguments identical to the fixture names? Assuming since this method itself is not a test that pytest won't try to inject the fixture here but not sure if that will always be the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. Did you see my comment about breaking up this function?
Codecov Report
@@ Coverage Diff @@
## master #22733 +/- ##
=======================================
Coverage 92.19% 92.19%
=======================================
Files 169 169
Lines 50835 50835
=======================================
Hits 46868 46868
Misses 3967 3967
Continue to review full report at Codecov.
|
@gfyoung @jreback @WillAyd @TomAugspurger
|
@WillAyd Green :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - @jreback
@pytest.mark.parametrize( | ||
"method", ['sum', 'mean', 'prod', 'var', | ||
'std', 'skew', 'min', 'max']) | ||
@pytest.mark.parametrize('method', ['sum', 'mean', 'prod', 'var', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in follow, can use these fixtures (may need to make small changes) when #22762 is merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, a couple of comments
pandas/tests/frame/test_analytics.py
Outdated
@@ -788,17 +803,14 @@ def alt(x): | |||
assert kurt.name is None | |||
assert kurt2.name == 'bar' | |||
|
|||
def _check_stat_op(self, name, alternative, frame=None, has_skipna=True, | |||
# underscores added to distinguish argument names from fixture names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this just be a module level function now? maybe rename to assert_stat_ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this a module-level function, and broke it up into assert_stat_op_calc
and assert_stat_op_api
. This also removes the redundancy mentioned above.
def test_any_all(self): | ||
self._check_bool_op('any', np.any, has_skipna=True, has_bool_only=True) | ||
self._check_bool_op('all', np.all, has_skipna=True, has_bool_only=True) | ||
def test_any_all(self, bool_frame_with_na, float_string_frame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can parameterize on ['all', 'any'] (use getattr(np, name)
in side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broke up _check_stat_op
and _check_bool_op
into two module-level functions each, which should be much cleaner in terms of needed fixtures/kwargs.
There is still some redundancy between the new assert_stat_op_calc
and assert_bool_op_calc
, but that's also true of the current state and therefore something for a follow-up -- this PR should (as far as possible) just be about fixturization.
pandas/tests/frame/test_analytics.py
Outdated
@@ -788,17 +803,14 @@ def alt(x): | |||
assert kurt.name is None | |||
assert kurt2.name == 'bar' | |||
|
|||
def _check_stat_op(self, name, alternative, frame=None, has_skipna=True, | |||
# underscores added to distinguish argument names from fixture names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this a module-level function, and broke it up into assert_stat_op_calc
and assert_stat_op_api
. This also removes the redundancy mentioned above.
391d122
to
717a12a
Compare
Reverted the underscores for the function parameters again after cross-review in #22730:
|
pandas/tests/frame/test_analytics.py
Outdated
|
||
|
||
class TestDataFrameAnalytics(TestData): | ||
def assert_stat_op_calc(opname, alternative, main_frame, has_skipna=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason you split up these to calc/api tests now? its very hard to tell if anything actually changed. I am concerned that something DID change even accidently. can you just do this as a copy-paste change of the original function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because they perform different checks, need different fixtures, and introduced the redundancy mentioned further up the thread (of checking the api-part twice in e.g. test_max
).
I did not remove any single line of the functions.
can you just do this as a copy-paste change of the original function.
you mean as a separate commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no i mean do a pre-cursor PR that moves the functions, but doesn't change them. Then a followon to show those changes. moving & changing need to be separate steps. It should be easy to see what actually changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no i mean do a pre-cursor PR that moves the functions, but doesn't change them. Then a followon to show those changes. moving & changing need to be separate steps. It should be easy to see what actually changed.
Why not a separate commit...? You can review commits individually. Plus, this is essentially linked with the whole fixturization effort, which would have to come first (and your request to make _check_stat_op
a module-level function is strictly speaking orthogonal to this PR).
I'll make some nice modular commits for you to review in sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can try, but its much easier to review a separate PR to be honest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed some commits that just go until moving _check_stat_op
and _check_bool_op
and making them run. I'll push the rest of the commits once the CI passes.
5e8b130
to
a5a44b3
Compare
a5a44b3
to
6c4a702
Compare
Kindly asking you to peruse the copy/paste and unindent commits here, if you can follow as required. The rest of the changes (coming after CI passes; or on your OK) will be trivial. |
@jreback |
@jreback I know you wanted separation (just moving the functions to module-level), but that's not possible without the fixturization that's carried out in this PR. However, I've really rolled out the red carpet here, the commits are super modular and easy to digest... |
If you start here ( |
@h-vetinari as I said above multiple times. I really prefer to do the diff using the tools which git provides. Its not easy to diff commits like this, pls separate this in to multiple PR's. Moving things first, then change. It will make everyone's life easier and speed time to acceptance of PRs. Ultimately things get squashed into a single commit anyhow, so putting in multiple commits doesn't really help anyone. Multiple PR's on the other hand, are merged separately. |
If you don't like the github-tools that allow comparing separate commits (which I linked; and where you can step through one by one), there's always Your request of moving the function was orthogonal to this PR - I still accomodated it. If anything could be split off, it's that (i.e. revert the last 4 commits). Similarly, the whole fixturization effort was your request in #22236 (orthogonal there as well) - it's not something I'm gonna fight for. Feel free to reopen if you wanna have a look again. |
Alright, now that I have a bit more time again, I'm less reluctant to do some hoop-jumping. @jreback, I've reverted back to the original purpose of this PR - fixturization. I'll leave making |
# GH #15390 | ||
original = self.simple.copy(deep=True) | ||
original = simple_frame.copy(deep=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note for later to clean up the names of these fixtures to be meaningful (e.g. float_frame is, simple_frame not so much, prob need to wait to see how much more things ike this are used)
@h-vetinari ok this looks ok thanks. |
@jreback
OK, this was part of the process in #22236. Originally it was just called |
@jreback |
git diff upstream/master -u -- "*.py" | flake8 --diff
This module is much harder to fixturize than e.g. #22236 or #22730, mainly due to the class methods
_check_stat_op
and_check_bool_op
, which, despite having an argument for the frame they're testing, are also testing on other quasi-fixtures ofTestData
. Since I can't import directly fromframe/conftest
without gettingRemovedInPytest4Warnings
, I made theses fixtures explicit arguments of the respective methods.Furthermore, I extracted two fixtures that basically correspond to those methods being called without a
frame
argument, and added them toconftest
.The larger question is how to avoid all these redundant calls being made (e.g. in
test_max
), and how_check_stat_op
/_check_stat_op
should be properly split up into several tests/parametrizations. So I don't view this PR as ready, but needing discussion regarding how to best proceed.