Skip to content

API: Use different exception type when MultiIndex indexing fails due to index not being lexsorted #11897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Dr-Irv opened this issue Dec 24, 2015 · 7 comments · Fixed by #14762
Labels
API Design Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Dec 24, 2015

I'd like to be able to separately trap errors in indexing that are due to the lack of the MultiIndex being lexsorted. Right now, a KeyError is raised, and the only way to tell if the error was due to the lack of a lexsort versus a missing key is to look at the text of the message.

So I'd like to avoid code like this:

try:
    subdf = df.loc['A':'D','Vals']
except KeyError as ker:
    if ker.args[0].startswith("MultiIndex Slicing requires the index to be fully lexsorted"):
        print ("Need to handle fact index not sorted")
    else:
        print ("Need to handle fact that key was missing")

I'd rather write something like:

try:
    subdf = df.loc['A':'D','Vals']
except UnsortedIndexError:
    print ("Need to handle fact index not sorted")
except KeyError:
    print ("Need to handle fact that key was missing")

So could the designers accept a change where a different error is raised in case the index is not lexsorted? If so, I'll implement it.

@jreback
Copy link
Contributor

jreback commented Dec 24, 2015

pls show a complete example uptop

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Dec 24, 2015

Here is an example:

mi = pd.MultiIndex.from_tuples([('z','a'),('x','a'),('y','b'),('x','b'),('y','a'),('z','b')], names=['one','two'])
df = pd.DataFrame([[i,10*i] for i in range(6)], index=mi, columns=['one','two'])
print(df.loc(axis=0)['z',:])

The last line fails with:
KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'

Now consider where we do the sorting:

df.sort_index(inplace=True)
print(df.loc(axis=0)['z',:])
print(df.loc(axis=0)['q',:])

The second line works because things are now sorted, but the last line fails with:
KeyError: 'q'

So I'd like to be able to surround all of the above code with ways of detecting the 2 different errors, which will make debugging easier.

@shoyer
Copy link
Member

shoyer commented Dec 25, 2015

I like this idea. To preserve backwards compatibility, we might make it a KeyError subclass. Note that there are also some cases where indexing non-multiindexes can raise a KeyError because the index is unsourced (notably, when slicing).

@jreback jreback added API Design Error Reporting Incorrect or improved errors from pandas labels Dec 26, 2015
@jreback jreback added this to the Next Major Release milestone Dec 26, 2015
@jreback
Copy link
Contributor

jreback commented Dec 26, 2015

would for sure have to be a KeyError for back-compat. (ideally this should be a ValueError, but that boat has sailed).

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Dec 28, 2015

@jreback So should I create the UnsortedIndexError code, test it, and do the pull request?

@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

@Dr-Irv interested in working on this?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 17, 2016

Yes, but it might be a while, as I've got some other pressing issues.

@jreback jreback modified the milestones: 0.18.1, 0.18.2 Apr 26, 2016
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.0, 0.19.0 Aug 21, 2016
Dr-Irv added a commit to Dr-Irv/pandas that referenced this issue Nov 29, 2016
Dr-Irv added a commit to Dr-Irv/pandas that referenced this issue Dec 6, 2016
ERR: Change error message   pandas-dev#14491

ENH: pandas-dev#11897 make lint work.  ERR: pandas-dev#14491 change error message

ERR: pandas-dev#14491 fix test for error message

fixes based on jreback feedback

fix indent issue

Doc fixes

Fixes per jreback comments
Dr-Irv added a commit to Dr-Irv/pandas that referenced this issue Dec 7, 2016
Dr-Irv added a commit to Dr-Irv/pandas that referenced this issue Dec 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants