TST: Test sorting levels not aligned with index (#25775) #26492

mahepe · 2019-05-22T19:04:53Z

closes sort_index does not work with levels not aligned with index #25775
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry (empty since no user-facing changs)

With master at 6d2398a, issue #25775 seems resolved. Added validation tests.

jschendel

Thanks! A couple comments on simplifying the implementation.

jschendel · 2019-05-22T20:06:13Z

pandas/tests/frame/test_sorting.py

+    @pytest.mark.parametrize('seed', list(range(100)))
+    def test_sort_multi_index(self, seed):
+        # GH 25775
+        pd.np.random.seed(seed)


Using a single deterministic example is probably easier to read and maintain.

I've found that the following example reproduces the issue on 0.23.0:

In [1]: import pandas as pd; pd.__version__ Out[1]: '0.23.0' In [2]: df = pd.DataFrame({'a': [0.4, 0.1, 0.3, 0.2], 'b': [0, 1, 1, 1], 'c': range(4), 'd': list('abcd')}) In [3]: df.set_index(list('abc')).sort_index(level=list('ba')) Out[3]: d a b c 0.4 0 0 a 0.1 1 1 b 0.3 1 2 c 0.2 1 3 d

And works on master:

In [1]: import pandas as pd; pd.__version__ Out[1]: '0.25.0.dev0+596.g20d0ad159a' In [2]: df = pd.DataFrame({'a': [0.4, 0.1, 0.3, 0.2], 'b': [0, 1, 1, 1], 'c': range(4), 'd': list('abcd')}) In [3]: df.set_index(list('abc')).sort_index(level=list('ba')) Out[3]: d a b c 0.4 0 0 a 0.1 1 1 b 0.2 1 3 d 0.3 1 2 c

Ok! I updated to a single example. I'm using the one given in #25775.

jschendel · 2019-05-22T20:06:54Z

pandas/tests/frame/test_sorting.py

+    def test_sort_multi_index(self, seed):
+        # GH 25775
+        pd.np.random.seed(seed)
+        df1 = (pd.DataFrame(pd.np.random.rand(10, 5),


DataFrame is already imported, so you don't need the pd. prefix.

thanks, updated

codecov · 2019-05-22T20:55:13Z

Codecov Report

Merging #26492 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26492      +/-   ##
==========================================
- Coverage   91.88%   91.87%   -0.01%     
==========================================
  Files         174      174              
  Lines       50692    50692              
==========================================
- Hits        46576    46571       -5     
- Misses       4116     4121       +5

Flag	Coverage Δ
#multiple	`90.41% <ø> (ø)`	⬆️
#single	`41.78% <ø> (-0.11%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97% <0%> (-0.12%)`	⬇️
pandas/util/testing.py	`90.81% <0%> (-0.11%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d124ea...5eb692b. Read the comment docs.

pep8speaks · 2019-05-25T06:16:10Z

Hello @mahepe! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-03 18:32:17 UTC

jreback · 2019-05-26T14:48:46Z

pandas/tests/frame/test_sorting.py

@@ -227,6 +228,30 @@ def test_stable_descending_multicolumn_sort(self):
                                   kind='mergesort')
        assert_frame_equal(sorted_df, expected)

+    def test_sort_multi_index(self):
+        # GH 25775


i am not clear by this example what you are even trying to test here. can you add some comments on the test and make the example a little more fluid (maybe add some comments).

updated to the simple example by the other reviewer

can you add a comment here on what this is testing, example is nice

jreback · 2019-05-26T14:49:17Z

pandas/tests/frame/test_sorting.py

+            '5,0.639,1,0.944,0.521,0.414' + \
+            '6,0.264,7,0.456,0.568,0.018' + \
+            '7,0.617,6,0.616,0.943,0.681' + \
+            '8,0.359,4,0.697,0.060,0.666' + \


is there a reason we are using a fixed string here rather than a generated frame with a seed?

The other reviewer requested "a single deterministic example" in contrast to iterating through a range of seeds. I wasn't sure what they exactly meant by "deterministic" i.e. whether fixing a seed is good enough or should the dataframe be explicitly given

Yeah, I meant a single explicit DataFrame that minimally reproduces the error. Setting a random seed and the generating random floats had nothing to do with the actual error in question, so IMO it made it less obvious as to what's going on. It also resulted in there be a lot more data than necessary to reproduce the error, which makes it harder to follow what's being tested.

So basically something like:

df = DataFrame({'a': [3, 1, 2], 'b': [0, 0, 0], 'c': [0, 1, 2], 'd': list('abc')}) result = df.set_index(list('abc')).sort_index(level=list('ba')) expected = DataFrame({'a': [1, 2, 3], 'b': [0, 0, 0], 'c': [1, 2, 0], 'd': list('bca')}) expected = expected.set_index(list('abc')) tm.assert_frame_equal(result, expected)

Note that df as defined above does indeed not sort correctly on 0.23.4:

In [1]: import numpy as np; import pandas as pd; pd.__version__ Out[1]: '0.23.4' In [2]: df = pd.DataFrame({'a' : [3, 1, 2], 'b': [0, 0, 0,], 'c': [0, 1, 2], 'd': list('abc')}) In [3]: df.set_index(list('abc')).sort_index(level=list('ba')) Out[3]: d a b c 3 0 0 a 1 0 1 b 2 0 2 c

I suppose you could parametrize over level if you wanted to, as adding 'c' still results in invalid sorting:

In [4]: df.set_index(list('abc')).sort_index(level=list('bac')) Out[4]: d a b c 3 0 0 a 1 0 1 b 2 0 2 c

Thanks @jschendel, switched to your simpler example.

jreback · 2019-05-26T14:49:36Z

pandas/tests/frame/test_sorting.py

+            '9,0.670,2,0.128,0.315,0.363'
+        data = StringIO(s)
+        df1 = (pd.read_csv(data)
+               .assign(b=lambda df: (df.b * 10).astype(int)))


these df1, df2 are not clear at all can you rename to something more descriptive.

updated to the simple example by the other reviewer

jreback

pls merge master and ping on green.

jreback · 2019-05-30T01:38:26Z

pandas/tests/frame/test_sorting.py

@@ -227,6 +228,30 @@ def test_stable_descending_multicolumn_sort(self):
                                   kind='mergesort')
        assert_frame_equal(sorted_df, expected)

+    def test_sort_multi_index(self):
+        # GH 25775


can you add a comment here on what this is testing, example is nice

mahepe · 2019-06-04T05:01:12Z

all green @jreback

jreback · 2019-06-05T12:54:40Z

thanks @mahepe

…andas-dev#26492)

gfyoung added Indexing Related to indexing on series/frames, not to indexes themselves Testing pandas testing functions or related to the test suite labels May 22, 2019

jschendel reviewed May 22, 2019

View reviewed changes

mahepe force-pushed the validation-test-for-sorting branch from 7efc4b8 to 58d8782 Compare May 25, 2019 06:16

mahepe force-pushed the validation-test-for-sorting branch 4 times, most recently from a89216a to 2a3bf94 Compare May 26, 2019 12:37

jreback requested changes May 26, 2019

View reviewed changes

mahepe force-pushed the validation-test-for-sorting branch from 2a3bf94 to 1763b74 Compare May 29, 2019 04:47

jreback requested changes May 30, 2019

View reviewed changes

Matias Heikkilä added 2 commits June 3, 2019 21:29

TST: Test sorting levels not aligned with index (pandas-dev#25775)

e40c5a9

Merge branch 'master' into validation-test-for-sorting

5eb692b

mahepe force-pushed the validation-test-for-sorting branch from 1763b74 to 5eb692b Compare June 3, 2019 18:32

jreback added this to the 0.25.0 milestone Jun 5, 2019

jreback approved these changes Jun 5, 2019

View reviewed changes

jreback merged commit 8ef9a63 into pandas-dev:master Jun 5, 2019

vaibhavhrt pushed a commit to vaibhavhrt/pandas that referenced this pull request Jun 6, 2019

TST: Test sorting levels not aligned with index (pandas-dev#25775) (p…

b5535dd

…andas-dev#26492)

Uh oh!

TST: Test sorting levels not aligned with index (#25775) #26492

TST: Test sorting levels not aligned with index (#25775) #26492

Uh oh!

Conversation

mahepe commented May 22, 2019

Uh oh!

jschendel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented May 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pep8speaks commented May 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-06-03 18:32:17 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mahepe commented Jun 4, 2019

Uh oh!

jreback commented Jun 5, 2019

Uh oh!

Uh oh!

codecov bot commented May 22, 2019 •

edited

Loading

pep8speaks commented May 25, 2019 •

edited

Loading