-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TST: Test sorting levels not aligned with index (#25775) #26492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! A couple comments on simplifying the implementation.
pandas/tests/frame/test_sorting.py
Outdated
@pytest.mark.parametrize('seed', list(range(100))) | ||
def test_sort_multi_index(self, seed): | ||
# GH 25775 | ||
pd.np.random.seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a single deterministic example is probably easier to read and maintain.
I've found that the following example reproduces the issue on 0.23.0:
In [1]: import pandas as pd; pd.__version__
Out[1]: '0.23.0'
In [2]: df = pd.DataFrame({'a': [0.4, 0.1, 0.3, 0.2], 'b': [0, 1, 1, 1], 'c': range(4), 'd': list('abcd')})
In [3]: df.set_index(list('abc')).sort_index(level=list('ba'))
Out[3]:
d
a b c
0.4 0 0 a
0.1 1 1 b
0.3 1 2 c
0.2 1 3 d
And works on master:
In [1]: import pandas as pd; pd.__version__
Out[1]: '0.25.0.dev0+596.g20d0ad159a'
In [2]: df = pd.DataFrame({'a': [0.4, 0.1, 0.3, 0.2], 'b': [0, 1, 1, 1], 'c': range(4), 'd': list('abcd')})
In [3]: df.set_index(list('abc')).sort_index(level=list('ba'))
Out[3]:
d
a b c
0.4 0 0 a
0.1 1 1 b
0.2 1 3 d
0.3 1 2 c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! I updated to a single example. I'm using the one given in #25775.
pandas/tests/frame/test_sorting.py
Outdated
def test_sort_multi_index(self, seed): | ||
# GH 25775 | ||
pd.np.random.seed(seed) | ||
df1 = (pd.DataFrame(pd.np.random.rand(10, 5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrame
is already imported, so you don't need the pd.
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, updated
Codecov Report
@@ Coverage Diff @@
## master #26492 +/- ##
==========================================
- Coverage 91.88% 91.87% -0.01%
==========================================
Files 174 174
Lines 50692 50692
==========================================
- Hits 46576 46571 -5
- Misses 4116 4121 +5
Continue to review full report at Codecov.
|
7efc4b8
to
58d8782
Compare
a89216a
to
2a3bf94
Compare
pandas/tests/frame/test_sorting.py
Outdated
@@ -227,6 +228,30 @@ def test_stable_descending_multicolumn_sort(self): | |||
kind='mergesort') | |||
assert_frame_equal(sorted_df, expected) | |||
|
|||
def test_sort_multi_index(self): | |||
# GH 25775 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not clear by this example what you are even trying to test here. can you add some comments on the test and make the example a little more fluid (maybe add some comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to the simple example by the other reviewer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here on what this is testing, example is nice
pandas/tests/frame/test_sorting.py
Outdated
'5,0.639,1,0.944,0.521,0.414' + \ | ||
'6,0.264,7,0.456,0.568,0.018' + \ | ||
'7,0.617,6,0.616,0.943,0.681' + \ | ||
'8,0.359,4,0.697,0.060,0.666' + \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we are using a fixed string here rather than a generated frame with a seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other reviewer requested "a single deterministic example" in contrast to iterating through a range of seeds. I wasn't sure what they exactly meant by "deterministic" i.e. whether fixing a seed is good enough or should the dataframe be explicitly given
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I meant a single explicit DataFrame
that minimally reproduces the error. Setting a random seed and the generating random floats had nothing to do with the actual error in question, so IMO it made it less obvious as to what's going on. It also resulted in there be a lot more data than necessary to reproduce the error, which makes it harder to follow what's being tested.
So basically something like:
df = DataFrame({'a': [3, 1, 2], 'b': [0, 0, 0],
'c': [0, 1, 2], 'd': list('abc')})
result = df.set_index(list('abc')).sort_index(level=list('ba'))
expected = DataFrame({'a': [1, 2, 3], 'b': [0, 0, 0],
'c': [1, 2, 0], 'd': list('bca')})
expected = expected.set_index(list('abc'))
tm.assert_frame_equal(result, expected)
Note that df
as defined above does indeed not sort correctly on 0.23.4:
In [1]: import numpy as np; import pandas as pd; pd.__version__
Out[1]: '0.23.4'
In [2]: df = pd.DataFrame({'a' : [3, 1, 2], 'b': [0, 0, 0,], 'c': [0, 1, 2], 'd': list('abc')})
In [3]: df.set_index(list('abc')).sort_index(level=list('ba'))
Out[3]:
d
a b c
3 0 0 a
1 0 1 b
2 0 2 c
I suppose you could parametrize over level
if you wanted to, as adding 'c'
still results in invalid sorting:
In [4]: df.set_index(list('abc')).sort_index(level=list('bac'))
Out[4]:
d
a b c
3 0 0 a
1 0 1 b
2 0 2 c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jschendel, switched to your simpler example.
pandas/tests/frame/test_sorting.py
Outdated
'9,0.670,2,0.128,0.315,0.363' | ||
data = StringIO(s) | ||
df1 = (pd.read_csv(data) | ||
.assign(b=lambda df: (df.b * 10).astype(int))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these df1, df2 are not clear at all can you rename to something more descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to the simple example by the other reviewer
2a3bf94
to
1763b74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls merge master and ping on green.
pandas/tests/frame/test_sorting.py
Outdated
@@ -227,6 +228,30 @@ def test_stable_descending_multicolumn_sort(self): | |||
kind='mergesort') | |||
assert_frame_equal(sorted_df, expected) | |||
|
|||
def test_sort_multi_index(self): | |||
# GH 25775 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here on what this is testing, example is nice
1763b74
to
5eb692b
Compare
all green @jreback |
thanks @mahepe |
sort_index
does not work with levels not aligned with index #25775git diff upstream/master -u -- "*.py" | flake8 --diff
With
master
at 6d2398a, issue #25775 seems resolved. Added validation tests.