-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Handle unsortable Periods correctly in set_index, MultiIndex #18208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/sorting.py
Outdated
@@ -431,21 +431,32 @@ def safe_sort(values, labels=None, na_sentinel=-1, assume_unique=False): | |||
|
|||
def sort_mixed(values): | |||
# order ints before strings, safe in py3 | |||
from pandas import Period |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you going thru all this trouble? not to mention this is going to be less performant.
let's just check that things are strings or ints and be done (anything else can just return as is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So don't bother sorting non-(str or int)? OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 4 tests that fail once we start ignoring non-(str or int); they are specifically checking for TypeErrors being raised here.
@@ -11,6 +11,36 @@ | |||
from ..datetimelike import DatetimeLike | |||
|
|||
|
|||
class TestPeriodLevelMultiIndex(object): | |||
# TODO: Is there a more appropriate place for these? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's figure that out in this PR (at first look, this location might actually be fine).
BTW, your class name isn't entirely accurate IIUC. Your first test doesn't involve a multi-index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. The class originally had several other tests before this PR was broken up into 3 pieces. Have a suggested name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 2nd should go in test_multi.py
@@ -11,6 +11,36 @@ | |||
from ..datetimelike import DatetimeLike | |||
|
|||
|
|||
class TestPeriodLevelMultiIndex(object): | |||
# TODO: Is there a more appropriate place for these? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 2nd should go in test_multi.py
ser = Series(data, index=index, name='Period') | ||
df = ser.to_frame() | ||
|
||
res = df.set_index('Period', append=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could go in tests/frame where we do .set_index tests
don’t add commentary
Will revisit this later. Closing. |
3rd of 3 to address bugs in #17112
set_index
goes through MultiIndex.from_arrays, which callsfactorize_from_iterables
... which tries to sort the inputs. In cases likePeriod
, sometimes the inputs can't be sorted. But for the purposes ofset_index
, we don't actually care about the order. So we impose a reasonable fallback.New tests are likely not in the correct place. Pls advise.