Skip to content

Handle unsortable Periods correctly in set_index, MultiIndex #18208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jbrockmendel
Copy link
Member

3rd of 3 to address bugs in #17112

set_index goes through MultiIndex.from_arrays, which calls factorize_from_iterables... which tries to sort the inputs. In cases like Period, sometimes the inputs can't be sorted. But for the purposes of set_index, we don't actually care about the order. So we impose a reasonable fallback.

New tests are likely not in the correct place. Pls advise.

@@ -431,21 +431,32 @@ def safe_sort(values, labels=None, na_sentinel=-1, assume_unique=False):

def sort_mixed(values):
# order ints before strings, safe in py3
from pandas import Period
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you going thru all this trouble? not to mention this is going to be less performant.

let's just check that things are strings or ints and be done (anything else can just return as is)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So don't bother sorting non-(str or int)? OK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 4 tests that fail once we start ignoring non-(str or int); they are specifically checking for TypeErrors being raised here.

@@ -11,6 +11,36 @@
from ..datetimelike import DatetimeLike


class TestPeriodLevelMultiIndex(object):
# TODO: Is there a more appropriate place for these?
Copy link
Member

@gfyoung gfyoung Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's figure that out in this PR (at first look, this location might actually be fine).

BTW, your class name isn't entirely accurate IIUC. Your first test doesn't involve a multi-index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. The class originally had several other tests before this PR was broken up into 3 pieces. Have a suggested name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 2nd should go in test_multi.py

@gfyoung gfyoung added Error Reporting Incorrect or improved errors from pandas Period Period data type labels Nov 10, 2017
@@ -11,6 +11,36 @@
from ..datetimelike import DatetimeLike


class TestPeriodLevelMultiIndex(object):
# TODO: Is there a more appropriate place for these?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 2nd should go in test_multi.py

ser = Series(data, index=index, name='Period')
df = ser.to_frame()

res = df.set_index('Period', append=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could go in tests/frame where we do .set_index tests

don’t add commentary

@jbrockmendel
Copy link
Member Author

Will revisit this later. Closing.

@jbrockmendel jbrockmendel deleted the period_bugs3 branch April 5, 2020 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants