TST/CLN: Fixturize frame/test_analytics #22733

h-vetinari · 2018-09-17T01:03:20Z

1 step closer towards TST/CLN: remove TestData from frame-tests; replace with fixtures #22471
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff

This module is much harder to fixturize than e.g. #22236 or #22730, mainly due to the class methods _check_stat_op and _check_bool_op, which, despite having an argument for the frame they're testing, are also testing on other quasi-fixtures of TestData. Since I can't import directly from frame/conftest without getting RemovedInPytest4Warnings, I made theses fixtures explicit arguments of the respective methods.

Furthermore, I extracted two fixtures that basically correspond to those methods being called without a frame argument, and added them to conftest.

The larger question is how to avoid all these redundant calls being made (e.g. in test_max), and how _check_stat_op / _check_stat_op should be properly split up into several tests/parametrizations. So I don't view this PR as ready, but needing discussion regarding how to best proceed.

pep8speaks · 2018-09-17T01:03:23Z

Hello @h-vetinari! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/tests/frame/conftest.py !
There are no PEP8 issues in the file pandas/tests/frame/test_analytics.py !

h-vetinari · 2018-09-17T09:45:09Z

pandas/tests/frame/test_analytics.py

-            expected = self.frame.corr()
-            expected.loc['A', 'B'] = expected.loc['B', 'A'] = nan
-            tm.assert_frame_equal(result, expected)
+    def _check_method(self, frame, method='pearson'):


I had a typo in the else-branch of check_minp, and all tests still ran through, so I'm suggesting to remove this unused branch.

h-vetinari · 2018-09-17T09:47:12Z

pandas/tests/frame/test_analytics.py

            assert lcd_dtype == result0.dtype
            assert lcd_dtype == result1.dtype

-        # result = f(axis=1)


Uncommenting these lines leads to some failures in test_sum, test_median, and test_sem, but they seem to be tested more thoroughly directly above (with the correct wrappers and kwargs), so I think they should be removed.

h-vetinari · 2018-09-17T09:49:08Z

pandas/tests/frame/test_analytics.py

        # bad axis
        tm.assert_raises_regex(ValueError, 'No axis named 2', f, axis=2)
        # make sure works on mixed-type frame
-        getattr(self.mixed_frame, name)(axis=0)
-        getattr(self.mixed_frame, name)(axis=1)
+        getattr(float_string_frame, name)(axis=0)


From here on I'm thinking these tests should live in a separate class method, or be broken out some other way.

h-vetinari · 2018-09-17T09:52:41Z

pandas/tests/frame/test_analytics.py

+            self._check_stat_op('max', np.max, float_frame_with_na,
+                                float_frame, float_string_frame,
+                                check_dates=True)
+        self._check_stat_op('max', np.max, int_frame, float_frame,


In tests like this one, all the parts of _check_stat_op that depend on float_frame and float_string_frame get tested twice unnecessarily

hmm, can you have a look at the git history and see if you can figure out why. otherwise ok to remove.

WillAyd

Similar comments to #22730 - if you can keep PR focused on fixtures and keep changes to name spacing / comments for another PR would be very helpful for review process

h-vetinari · 2018-09-18T00:22:15Z

@WillAyd Reverted all the offending orthogonal changes

WillAyd

Looking much better - thanks!

WillAyd · 2018-09-18T06:01:06Z

pandas/tests/frame/test_analytics.py

+    def _check_method(self, frame, method='pearson'):
+        correls = frame.corr(method=method)
+        exp = frame['A'].corr(frame['C'], method=method)
+        tm.assert_almost_equal(correls['A']['C'], exp)


Here you can use result and expected in spite of the existing code using exp

Um, first you ask me to remove all orthogonal changes, and then you ask me to add... orthogonal changes? Same happened with the import clean-up other PRs, but when I preempted them here, you ask me to remove them - can you understand how exasperating this back-and-forth is?

WillAyd · 2018-09-18T06:08:43Z

pandas/tests/frame/test_analytics.py

@@ -2078,9 +2079,6 @@ def test_n_error(self, df_main_dtypes, nselect_method, columns):
        col = columns[1]
        error_msg = self.dtype_error_msg_template.format(
            column=col, method=nselect_method, dtype=df[col].dtype)
-        # escape some characters that may be in the repr
-        error_msg = (error_msg.replace('(', '\\(').replace(")", "\\)")


Why was this removed?

I removed it as part of trying to hunt down some DeprecationWarnings for unescaped brackets (which I believe are somehow coming from pytest), and the tests still ran through. It is IMO an unnecessary misdirection that makes the test harder to read, but I reverted it.

WillAyd · 2018-09-18T06:13:16Z

pandas/tests/frame/test_analytics.py

@@ -784,17 +799,13 @@ def alt(x):
        assert kurt.name is None
        assert kurt2.name == 'bar'

-    def _check_stat_op(self, name, alternative, frame=None, has_skipna=True,
+    def _check_stat_op(self, name, alternative, main_frame, float_frame,


Is there any risk here in using arguments identical to the fixture names? Assuming since this method itself is not a test that pytest won't try to inject the fixture here but not sure if that will always be the case

Changed. Did you see my comment about breaking up this function?

codecov · 2018-09-18T20:23:06Z

Codecov Report

Merging #22733 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #22733   +/-   ##
=======================================
  Coverage   92.19%   92.19%           
=======================================
  Files         169      169           
  Lines       50835    50835           
=======================================
  Hits        46868    46868           
  Misses       3967     3967

Flag	Coverage Δ
#multiple	`90.61% <ø> (ø)`	⬆️
#single	`42.35% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d523d9f...c227fa2. Read the comment docs.

h-vetinari · 2018-09-19T13:34:46Z

@gfyoung @jreback @WillAyd @TomAugspurger
Weird error on Appveyor, could some one please restart?

Build started
git clone -q https://github.com/pandas-dev/pandas.git C:\projects\pandas
git fetch -q origin +refs/pull/22733/merge:
git checkout -qf FETCH_HEAD
Running Install scripts
if ($env:APPVEYOR_PULL_REQUEST_NUMBER -and $env:APPVEYOR_BUILD_NUMBER -ne ((Invoke-RestMethod ` https://ci.appveyor.com/api/projects/$env:APPVEYOR_ACCOUNT_NAME/$env:APPVEYOR_PROJECT_SLUG/history?recordsNumber=50).builds | ` Where-Object pullRequestId -eq $env:APPVEYOR_PULL_REQUEST_NUMBER)[0].buildNumber) { ` throw "There are newer queued builds for this pull request, failing early." }
Cannot index into a null array.
At line:1 char:5
+ if ($env:APPVEYOR_PULL_REQUEST_NUMBER -and $env:APPVEYOR_BUILD_NUMBER ...
+     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : NullArray
 
Command executed with exception: Cannot index into a null array.

h-vetinari · 2018-09-20T16:06:14Z

@WillAyd Green :)

WillAyd

lgtm - @jreback

jreback · 2018-09-23T13:37:28Z

pandas/tests/frame/test_analytics.py

-    @pytest.mark.parametrize(
-        "method", ['sum', 'mean', 'prod', 'var',
-                   'std', 'skew', 'min', 'max'])
+    @pytest.mark.parametrize('method', ['sum', 'mean', 'prod', 'var',


in follow, can use these fixtures (may need to make small changes) when #22762 is merged

jreback

looks good, a couple of comments

jreback · 2018-09-23T13:39:32Z

pandas/tests/frame/test_analytics.py

@@ -788,17 +803,14 @@ def alt(x):
        assert kurt.name is None
        assert kurt2.name == 'bar'

-    def _check_stat_op(self, name, alternative, frame=None, has_skipna=True,
+    # underscores added to distinguish argument names from fixture names


can this just be a module level function now? maybe rename to assert_stat_ops

I made this a module-level function, and broke it up into assert_stat_op_calc and assert_stat_op_api. This also removes the redundancy mentioned above.

jreback · 2018-09-23T13:40:46Z

pandas/tests/frame/test_analytics.py

-    def test_any_all(self):
-        self._check_bool_op('any', np.any, has_skipna=True, has_bool_only=True)
-        self._check_bool_op('all', np.all, has_skipna=True, has_bool_only=True)
+    def test_any_all(self, bool_frame_with_na, float_string_frame):


can parameterize on ['all', 'any'] (use getattr(np, name) in side

h-vetinari

Broke up _check_stat_op and _check_bool_op into two module-level functions each, which should be much cleaner in terms of needed fixtures/kwargs.

There is still some redundancy between the new assert_stat_op_calc and assert_bool_op_calc, but that's also true of the current state and therefore something for a follow-up -- this PR should (as far as possible) just be about fixturization.

h-vetinari · 2018-09-23T15:33:19Z

pandas/tests/frame/test_analytics.py

@@ -788,17 +803,14 @@ def alt(x):
        assert kurt.name is None
        assert kurt2.name == 'bar'

-    def _check_stat_op(self, name, alternative, frame=None, has_skipna=True,
+    # underscores added to distinguish argument names from fixture names


I made this a module-level function, and broke it up into assert_stat_op_calc and assert_stat_op_api. This also removes the redundancy mentioned above.

h-vetinari · 2018-09-23T21:51:51Z

@WillAyd @jreback @gfyoung

Reverted the underscores for the function parameters again after cross-review in #22730:
@jreback:

actually don't think you need to do this as pytest doesn't care

@h-vetinari:

This is from #22733, where this was an review request from @WillAyd.

Is there any risk here in using arguments identical to the fixture names? Assuming since this method itself is not a test that pytest won't try to inject the fixture here but not sure if that will always be the case

@gfyoung:

Fixture injection should only happen when we have a function called test_*. pytest wouldn't even collect that function at all. Thus, I agree with @jreback to drop underscores.
If we're wrong, IMO that's a design flaw.

jreback · 2018-09-25T13:06:36Z

pandas/tests/frame/test_analytics.py



-class TestDataFrameAnalytics(TestData):
+def assert_stat_op_calc(opname, alternative, main_frame, has_skipna=True,


is there a reason you split up these to calc/api tests now? its very hard to tell if anything actually changed. I am concerned that something DID change even accidently. can you just do this as a copy-paste change of the original function.

Yes, because they perform different checks, need different fixtures, and introduced the redundancy mentioned further up the thread (of checking the api-part twice in e.g. test_max).

I did not remove any single line of the functions.

can you just do this as a copy-paste change of the original function.

you mean as a separate commit?

no i mean do a pre-cursor PR that moves the functions, but doesn't change them. Then a followon to show those changes. moving & changing need to be separate steps. It should be easy to see what actually changed.

no i mean do a pre-cursor PR that moves the functions, but doesn't change them. Then a followon to show those changes. moving & changing need to be separate steps. It should be easy to see what actually changed.

Why not a separate commit...? You can review commits individually. Plus, this is essentially linked with the whole fixturization effort, which would have to come first (and your request to make _check_stat_op a module-level function is strictly speaking orthogonal to this PR).

I'll make some nice modular commits for you to review in sequence.

you can try, but its much easier to review a separate PR to be honest

I pushed some commits that just go until moving _check_stat_op and _check_bool_op and making them run. I'll push the rest of the commits once the CI passes.

h-vetinari · 2018-09-25T19:25:10Z

@jreback

I pushed some commits that just go until moving _check_stat_op and _check_bool_op and making them run. I'll push the rest of the commits once the CI passes.

Kindly asking you to peruse the copy/paste and unindent commits here, if you can follow as required. The rest of the changes (coming after CI passes; or on your OK) will be trivial.

h-vetinari · 2018-09-25T23:16:31Z

@jreback
So the CI was successful except a CondaHTTPError: HTTP 504 GATEWAY TIME-OUT for py36_locale_slow. Pushing the rest of the commits.

h-vetinari · 2018-09-26T20:43:46Z

@jreback
PTAL

I know you wanted separation (just moving the functions to module-level), but that's not possible without the fixturization that's carried out in this PR. However, I've really rolled out the red carpet here, the commits are super modular and easy to digest...

h-vetinari · 2018-10-02T06:50:28Z

@jreback

If you start here (https://github.com/pandas-dev/pandas/pull/22733/commits/7ac476ec50d3a9b44112c078a61f9455efe93c07), the commits are very easy to follow, I promise. :)

jreback · 2018-10-02T21:26:15Z

@h-vetinari as I said above multiple times. I really prefer to do the diff using the tools which git provides. Its not easy to diff commits like this, pls separate this in to multiple PR's. Moving things first, then change. It will make everyone's life easier and speed time to acceptance of PRs.

Ultimately things get squashed into a single commit anyhow, so putting in multiple commits doesn't really help anyone. Multiple PR's on the other hand, are merged separately.

h-vetinari · 2018-10-02T22:31:37Z

@jreback

I really prefer to do the diff using the tools which git provides

If you don't like the github-tools that allow comparing separate commits (which I linked; and where you can step through one by one), there's always git diff <hash1> <hash2>.

Your request of moving the function was orthogonal to this PR - I still accomodated it. If anything could be split off, it's that (i.e. revert the last 4 commits). Similarly, the whole fixturization effort was your request in #22236 (orthogonal there as well) - it's not something I'm gonna fight for. Feel free to reopen if you wanna have a look again.

h-vetinari · 2018-10-05T16:26:45Z

Alright, now that I have a bit more time again, I'm less reluctant to do some hoop-jumping.

@jreback, I've reverted back to the original purpose of this PR - fixturization. I'll leave making _check_stat_op a module-level function for a follow-up.

…nalytics

This reverts commit 56020d7.

This reverts commit 4a2adeb.

This reverts commit b043bb4.

This reverts commit 98f3243.

jreback · 2018-10-06T15:48:04Z

pandas/tests/frame/test_analytics.py

        # GH #15390
-        original = self.simple.copy(deep=True)
+        original = simple_frame.copy(deep=True)


note for later to clean up the names of these fixtures to be meaningful (e.g. float_frame is, simple_frame not so much, prob need to wait to see how much more things ike this are used)

jreback · 2018-10-06T15:48:30Z

@h-vetinari ok this looks ok thanks.

h-vetinari · 2018-10-06T15:55:57Z

@jreback
Thanks.

note for later to clean up the names of these fixtures to be meaningful (e.g. float_frame is, simple_frame not so much, prob need to wait to see how much more things ike this are used)

OK, this was part of the process in #22236. Originally it was just called simple - simple_frame was the closest in spirit. I'd argue it should be renamed ASAP (and update the "translation guide" in #22471), before more modules get translated. What name would you like to have?

h-vetinari · 2018-10-09T16:26:30Z

@jreback
What name would you like to have for simple_frame? (see comment above)

h-vetinari mentioned this pull request Sep 17, 2018

TST/CLN: remove TestData from frame-tests; replace with fixtures #22471

Closed

34 tasks

h-vetinari commented Sep 17, 2018

View reviewed changes

WillAyd requested changes Sep 17, 2018

View reviewed changes

WillAyd added Testing pandas testing functions or related to the test suite Clean labels Sep 17, 2018

WillAyd requested changes Sep 18, 2018

View reviewed changes

WillAyd approved these changes Sep 20, 2018

View reviewed changes

jreback reviewed Sep 23, 2018

View reviewed changes

jreback requested changes Sep 23, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Sep 23, 2018

h-vetinari commented Sep 23, 2018

View reviewed changes

h-vetinari force-pushed the fixturize_frame_analytics branch from 391d122 to 717a12a Compare September 23, 2018 16:10

h-vetinari mentioned this pull request Sep 23, 2018

CLN: res/exp and GH references in frame tests #22730

Merged

3 tasks

jreback requested changes Sep 25, 2018

View reviewed changes

h-vetinari force-pushed the fixturize_frame_analytics branch from 5e8b130 to a5a44b3 Compare September 25, 2018 16:57

h-vetinari added 9 commits September 25, 2018 19:02

Fixturize frame/test_analytics.py

d23ac16

Review (WillAyd)

485e0d8

Revert disambiguating underscores

f1a394a

Pure copy/paste of _check_stat_op and _check_bool_op

7ac476e

Pure unindent of _check_stat_op and _check_bool_op

e1a8c5a

Make _check_stat_op and _check_bool_op run

6c4a702

Correctly group tests within _check_[stat/bool]_op

98f3243

Consistent naming of parameters

b043bb4

Break up _check_[stat/bool]_op

4a2adeb

Final touches

56020d7

h-vetinari force-pushed the fixturize_frame_analytics branch from a5a44b3 to 6c4a702 Compare September 25, 2018 17:03

h-vetinari mentioned this pull request Oct 2, 2018

Analytics.py fixtures added #22940

Closed

1 task

h-vetinari closed this Oct 2, 2018

h-vetinari reopened this Oct 5, 2018

h-vetinari added 5 commits October 5, 2018 18:34

Merge remote-tracking branch 'upstream/master' into fixturize_frame_a…

c07c5ed

…nalytics

Revert "Final touches"

7a56cfb

This reverts commit 56020d7.

Revert "Break up _check_[stat/bool]_op"

e197fe7

This reverts commit 4a2adeb.

Revert "Consistent naming of parameters"

48272d9

This reverts commit b043bb4.

Revert "Correctly group tests within _check_[stat/bool]_op"

c227fa2

This reverts commit 98f3243.

jreback reviewed Oct 6, 2018

View reviewed changes

jreback approved these changes Oct 6, 2018

View reviewed changes

jreback merged commit 5551bcf into pandas-dev:master Oct 6, 2018

h-vetinari mentioned this pull request Oct 6, 2018

TST: further clean up of frame/test_analytics #23016

Merged

h-vetinari deleted the fixturize_frame_analytics branch October 6, 2018 16:21



		class TestDataFrameAnalytics(TestData):
		def assert_stat_op_calc(opname, alternative, main_frame, has_skipna=True,

TST/CLN: Fixturize frame/test_analytics #22733

TST/CLN: Fixturize frame/test_analytics #22733

Conversation

h-vetinari commented Sep 17, 2018 • edited Loading

pep8speaks commented Sep 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

h-vetinari commented Sep 18, 2018

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari Sep 18, 2018 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Sep 18, 2018 • edited Loading

Codecov Report

h-vetinari commented Sep 19, 2018

h-vetinari commented Sep 20, 2018

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari commented Sep 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-vetinari commented Sep 25, 2018 • edited Loading

h-vetinari commented Sep 25, 2018

h-vetinari commented Sep 26, 2018

h-vetinari commented Oct 2, 2018

jreback commented Oct 2, 2018

h-vetinari commented Oct 2, 2018

h-vetinari commented Oct 5, 2018

Choose a reason for hiding this comment

jreback commented Oct 6, 2018

h-vetinari commented Oct 6, 2018

h-vetinari commented Oct 9, 2018

h-vetinari commented Sep 17, 2018 •

edited

Loading

h-vetinari Sep 18, 2018 •

edited

Loading

codecov bot commented Sep 18, 2018 •

edited

Loading

h-vetinari left a comment •

edited

Loading

h-vetinari commented Sep 23, 2018 •

edited

Loading

h-vetinari commented Sep 25, 2018 •

edited

Loading