Skip to content

TST/CLN: Fixturize frame/test_analytics #22733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 6, 2018

Conversation

h-vetinari
Copy link
Contributor

@h-vetinari h-vetinari commented Sep 17, 2018

This module is much harder to fixturize than e.g. #22236 or #22730, mainly due to the class methods _check_stat_op and _check_bool_op, which, despite having an argument for the frame they're testing, are also testing on other quasi-fixtures of TestData. Since I can't import directly from frame/conftest without getting RemovedInPytest4Warnings, I made theses fixtures explicit arguments of the respective methods.

Furthermore, I extracted two fixtures that basically correspond to those methods being called without a frame argument, and added them to conftest.

The larger question is how to avoid all these redundant calls being made (e.g. in test_max), and how _check_stat_op / _check_stat_op should be properly split up into several tests/parametrizations. So I don't view this PR as ready, but needing discussion regarding how to best proceed.

@pep8speaks
Copy link

Hello @h-vetinari! Thanks for submitting the PR.

expected = self.frame.corr()
expected.loc['A', 'B'] = expected.loc['B', 'A'] = nan
tm.assert_frame_equal(result, expected)
def _check_method(self, frame, method='pearson'):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a typo in the else-branch of check_minp, and all tests still ran through, so I'm suggesting to remove this unused branch.

assert lcd_dtype == result0.dtype
assert lcd_dtype == result1.dtype

# result = f(axis=1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncommenting these lines leads to some failures in test_sum, test_median, and test_sem, but they seem to be tested more thoroughly directly above (with the correct wrappers and kwargs), so I think they should be removed.

# bad axis
tm.assert_raises_regex(ValueError, 'No axis named 2', f, axis=2)
# make sure works on mixed-type frame
getattr(self.mixed_frame, name)(axis=0)
getattr(self.mixed_frame, name)(axis=1)
getattr(float_string_frame, name)(axis=0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From here on I'm thinking these tests should live in a separate class method, or be broken out some other way.

self._check_stat_op('max', np.max, float_frame_with_na,
float_frame, float_string_frame,
check_dates=True)
self._check_stat_op('max', np.max, int_frame, float_frame,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tests like this one, all the parts of _check_stat_op that depend on float_frame and float_string_frame get tested twice unnecessarily

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, can you have a look at the git history and see if you can figure out why. otherwise ok to remove.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments to #22730 - if you can keep PR focused on fixtures and keep changes to name spacing / comments for another PR would be very helpful for review process

@WillAyd WillAyd added Testing pandas testing functions or related to the test suite Clean labels Sep 17, 2018
@h-vetinari
Copy link
Contributor Author

@WillAyd Reverted all the offending orthogonal changes

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much better - thanks!

def _check_method(self, frame, method='pearson'):
correls = frame.corr(method=method)
exp = frame['A'].corr(frame['C'], method=method)
tm.assert_almost_equal(correls['A']['C'], exp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can use result and expected in spite of the existing code using exp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, first you ask me to remove all orthogonal changes, and then you ask me to add... orthogonal changes? Same happened with the import clean-up other PRs, but when I preempted them here, you ask me to remove them - can you understand how exasperating this back-and-forth is?

@@ -2078,9 +2079,6 @@ def test_n_error(self, df_main_dtypes, nselect_method, columns):
col = columns[1]
error_msg = self.dtype_error_msg_template.format(
column=col, method=nselect_method, dtype=df[col].dtype)
# escape some characters that may be in the repr
error_msg = (error_msg.replace('(', '\\(').replace(")", "\\)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it as part of trying to hunt down some DeprecationWarnings for unescaped brackets (which I believe are somehow coming from pytest), and the tests still ran through. It is IMO an unnecessary misdirection that makes the test harder to read, but I reverted it.

@@ -784,17 +799,13 @@ def alt(x):
assert kurt.name is None
assert kurt2.name == 'bar'

def _check_stat_op(self, name, alternative, frame=None, has_skipna=True,
def _check_stat_op(self, name, alternative, main_frame, float_frame,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any risk here in using arguments identical to the fixture names? Assuming since this method itself is not a test that pytest won't try to inject the fixture here but not sure if that will always be the case

Copy link
Contributor Author

@h-vetinari h-vetinari Sep 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Did you see my comment about breaking up this function?

@codecov
Copy link

codecov bot commented Sep 18, 2018

Codecov Report

Merging #22733 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #22733   +/-   ##
=======================================
  Coverage   92.19%   92.19%           
=======================================
  Files         169      169           
  Lines       50835    50835           
=======================================
  Hits        46868    46868           
  Misses       3967     3967
Flag Coverage Δ
#multiple 90.61% <ø> (ø) ⬆️
#single 42.35% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d523d9f...c227fa2. Read the comment docs.

@h-vetinari
Copy link
Contributor Author

@gfyoung @jreback @WillAyd @TomAugspurger
Weird error on Appveyor, could some one please restart?

Build started
git clone -q https://github.com/pandas-dev/pandas.git C:\projects\pandas
git fetch -q origin +refs/pull/22733/merge:
git checkout -qf FETCH_HEAD
Running Install scripts
if ($env:APPVEYOR_PULL_REQUEST_NUMBER -and $env:APPVEYOR_BUILD_NUMBER -ne ((Invoke-RestMethod ` https://ci.appveyor.com/api/projects/$env:APPVEYOR_ACCOUNT_NAME/$env:APPVEYOR_PROJECT_SLUG/history?recordsNumber=50).builds | ` Where-Object pullRequestId -eq $env:APPVEYOR_PULL_REQUEST_NUMBER)[0].buildNumber) { ` throw "There are newer queued builds for this pull request, failing early." }
Cannot index into a null array.
At line:1 char:5
+ if ($env:APPVEYOR_PULL_REQUEST_NUMBER -and $env:APPVEYOR_BUILD_NUMBER ...
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : NullArray
 
Command executed with exception: Cannot index into a null array.

@h-vetinari
Copy link
Contributor Author

@WillAyd Green :)

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - @jreback

@pytest.mark.parametrize(
"method", ['sum', 'mean', 'prod', 'var',
'std', 'skew', 'min', 'max'])
@pytest.mark.parametrize('method', ['sum', 'mean', 'prod', 'var',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in follow, can use these fixtures (may need to make small changes) when #22762 is merged

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, a couple of comments

@@ -788,17 +803,14 @@ def alt(x):
assert kurt.name is None
assert kurt2.name == 'bar'

def _check_stat_op(self, name, alternative, frame=None, has_skipna=True,
# underscores added to distinguish argument names from fixture names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this just be a module level function now? maybe rename to assert_stat_ops

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this a module-level function, and broke it up into assert_stat_op_calc and assert_stat_op_api. This also removes the redundancy mentioned above.

def test_any_all(self):
self._check_bool_op('any', np.any, has_skipna=True, has_bool_only=True)
self._check_bool_op('all', np.all, has_skipna=True, has_bool_only=True)
def test_any_all(self, bool_frame_with_na, float_string_frame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can parameterize on ['all', 'any'] (use getattr(np, name) in side

@jreback jreback added this to the 0.24.0 milestone Sep 23, 2018
Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broke up _check_stat_op and _check_bool_op into two module-level functions each, which should be much cleaner in terms of needed fixtures/kwargs.

There is still some redundancy between the new assert_stat_op_calc and assert_bool_op_calc, but that's also true of the current state and therefore something for a follow-up -- this PR should (as far as possible) just be about fixturization.

@@ -788,17 +803,14 @@ def alt(x):
assert kurt.name is None
assert kurt2.name == 'bar'

def _check_stat_op(self, name, alternative, frame=None, has_skipna=True,
# underscores added to distinguish argument names from fixture names
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this a module-level function, and broke it up into assert_stat_op_calc and assert_stat_op_api. This also removes the redundancy mentioned above.

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Sep 23, 2018

@WillAyd @jreback @gfyoung

Reverted the underscores for the function parameters again after cross-review in #22730:
@jreback:

actually don't think you need to do this as pytest doesn't care

@h-vetinari:

This is from #22733, where this was an review request from @WillAyd.

Is there any risk here in using arguments identical to the fixture names? Assuming since this method itself is not a test that pytest won't try to inject the fixture here but not sure if that will always be the case

@gfyoung:

Fixture injection should only happen when we have a function called test_*. pytest wouldn't even collect that function at all. Thus, I agree with @jreback to drop underscores.
If we're wrong, IMO that's a design flaw.



class TestDataFrameAnalytics(TestData):
def assert_stat_op_calc(opname, alternative, main_frame, has_skipna=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you split up these to calc/api tests now? its very hard to tell if anything actually changed. I am concerned that something DID change even accidently. can you just do this as a copy-paste change of the original function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because they perform different checks, need different fixtures, and introduced the redundancy mentioned further up the thread (of checking the api-part twice in e.g. test_max).

I did not remove any single line of the functions.

can you just do this as a copy-paste change of the original function.

you mean as a separate commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i mean do a pre-cursor PR that moves the functions, but doesn't change them. Then a followon to show those changes. moving & changing need to be separate steps. It should be easy to see what actually changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i mean do a pre-cursor PR that moves the functions, but doesn't change them. Then a followon to show those changes. moving & changing need to be separate steps. It should be easy to see what actually changed.

Why not a separate commit...? You can review commits individually. Plus, this is essentially linked with the whole fixturization effort, which would have to come first (and your request to make _check_stat_op a module-level function is strictly speaking orthogonal to this PR).

I'll make some nice modular commits for you to review in sequence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can try, but its much easier to review a separate PR to be honest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some commits that just go until moving _check_stat_op and _check_bool_op and making them run. I'll push the rest of the commits once the CI passes.

@h-vetinari h-vetinari force-pushed the fixturize_frame_analytics branch from 5e8b130 to a5a44b3 Compare September 25, 2018 16:57
@h-vetinari h-vetinari force-pushed the fixturize_frame_analytics branch from a5a44b3 to 6c4a702 Compare September 25, 2018 17:03
@h-vetinari
Copy link
Contributor Author

h-vetinari commented Sep 25, 2018

@jreback

I pushed some commits that just go until moving _check_stat_op and _check_bool_op and making them run. I'll push the rest of the commits once the CI passes.

Kindly asking you to peruse the copy/paste and unindent commits here, if you can follow as required. The rest of the changes (coming after CI passes; or on your OK) will be trivial.

@h-vetinari
Copy link
Contributor Author

@jreback
So the CI was successful except a CondaHTTPError: HTTP 504 GATEWAY TIME-OUT for py36_locale_slow. Pushing the rest of the commits.

@h-vetinari
Copy link
Contributor Author

@jreback
PTAL

I know you wanted separation (just moving the functions to module-level), but that's not possible without the fixturization that's carried out in this PR. However, I've really rolled out the red carpet here, the commits are super modular and easy to digest...

@h-vetinari h-vetinari mentioned this pull request Oct 2, 2018
1 task
@h-vetinari
Copy link
Contributor Author

@jreback

If you start here (https://github.com/pandas-dev/pandas/pull/22733/commits/7ac476ec50d3a9b44112c078a61f9455efe93c07), the commits are very easy to follow, I promise. :)

@jreback
Copy link
Contributor

jreback commented Oct 2, 2018

@h-vetinari as I said above multiple times. I really prefer to do the diff using the tools which git provides. Its not easy to diff commits like this, pls separate this in to multiple PR's. Moving things first, then change. It will make everyone's life easier and speed time to acceptance of PRs.

Ultimately things get squashed into a single commit anyhow, so putting in multiple commits doesn't really help anyone. Multiple PR's on the other hand, are merged separately.

@h-vetinari
Copy link
Contributor Author

@jreback

I really prefer to do the diff using the tools which git provides

If you don't like the github-tools that allow comparing separate commits (which I linked; and where you can step through one by one), there's always git diff <hash1> <hash2>.

Your request of moving the function was orthogonal to this PR - I still accomodated it. If anything could be split off, it's that (i.e. revert the last 4 commits). Similarly, the whole fixturization effort was your request in #22236 (orthogonal there as well) - it's not something I'm gonna fight for. Feel free to reopen if you wanna have a look again.

@h-vetinari h-vetinari closed this Oct 2, 2018
@h-vetinari
Copy link
Contributor Author

Alright, now that I have a bit more time again, I'm less reluctant to do some hoop-jumping.

@jreback, I've reverted back to the original purpose of this PR - fixturization. I'll leave making _check_stat_op a module-level function for a follow-up.

@h-vetinari h-vetinari reopened this Oct 5, 2018
# GH #15390
original = self.simple.copy(deep=True)
original = simple_frame.copy(deep=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for later to clean up the names of these fixtures to be meaningful (e.g. float_frame is, simple_frame not so much, prob need to wait to see how much more things ike this are used)

@jreback
Copy link
Contributor

jreback commented Oct 6, 2018

@h-vetinari ok this looks ok thanks.

@jreback jreback merged commit 5551bcf into pandas-dev:master Oct 6, 2018
@h-vetinari
Copy link
Contributor Author

@jreback
Thanks.

note for later to clean up the names of these fixtures to be meaningful (e.g. float_frame is, simple_frame not so much, prob need to wait to see how much more things ike this are used)

OK, this was part of the process in #22236. Originally it was just called simple - simple_frame was the closest in spirit. I'd argue it should be renamed ASAP (and update the "translation guide" in #22471), before more modules get translated. What name would you like to have?

@h-vetinari h-vetinari deleted the fixturize_frame_analytics branch October 6, 2018 16:21
@h-vetinari
Copy link
Contributor Author

@jreback
What name would you like to have for simple_frame? (see comment above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants