Skip to content

ENH: Introduce UnsortedIndexError GH11897 #14762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 9, 2016
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -528,12 +528,14 @@ return a copy of the data rather than a view:
jim joe
1 z 0.64094

.. _advanced.unsorted:

Furthermore if you try to index something that is not fully lexsorted, this can raise:

.. code-block:: ipython

In [5]: dfm.loc[(0,'y'):(1, 'z')]
KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (1)'
UnsortedIndexError: 'Key length (2) was greater than MultiIndex lexsort depth (1)'

The ``is_lexsorted()`` method on an ``Index`` show if the index is sorted, and the ``lexsort_depth`` property returns the sort depth:

Expand Down
7 changes: 7 additions & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ Other enhancements

- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

- New ``UnsortedIndexError`` (subclass of ``KeyError``) raised when indexing/slicing into an
unsorted MultiIndex (:issue:`11897`). This allows differentiation between errors due to lack
of sorting or an incorrect key. See :ref:`here <advanced.unsorted>`


.. _whatsnew_0200.api_breaking:

Expand All @@ -58,6 +62,9 @@ Backwards incompatible API changes
Other API Changes
^^^^^^^^^^^^^^^^^

- Change error message text when indexing via a
boolean ``Series`` that has an incompatible index (:issue:`14491`)

.. _whatsnew_0200.deprecations:

Deprecations
Expand Down
11 changes: 11 additions & 0 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,17 @@ class UnsupportedFunctionCall(ValueError):
pass


class UnsortedIndexError(KeyError):
""" Error raised when attempting to get a slice of a MultiIndex
and the index has not been lexsorted. Subclass of `KeyError`.

.. versionadded:: 0.20.0

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versiononadded tag here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This error is raised when" -> "Error raised when"

Can you also add that it is a subclass of a KeyError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done




class AbstractMethodError(NotImplementedError):
"""Raise this error instead of NotImplementedError for abstract methods
while keeping compatibility with Python 2 and Python 3.
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1814,7 +1814,9 @@ def check_bool_indexer(ax, key):
result = result.reindex(ax)
mask = isnull(result._values)
if mask.any():
raise IndexingError('Unalignable boolean Series key provided')
raise IndexingError('Unalignable boolean Series provided as indexer '
'(index of the boolean Series and of the indexed '
'object do not match')
result = result.astype(bool)._values
elif is_sparse(result):
result = result.to_dense()
Expand Down
10 changes: 6 additions & 4 deletions pandas/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@
from pandas.core.common import (_values_from_object,
is_bool_indexer,
is_null_slice,
PerformanceWarning)
PerformanceWarning,
UnsortedIndexError)


from pandas.core.base import FrozenList
Expand Down Expand Up @@ -1936,9 +1937,10 @@ def get_locs(self, tup):

# must be lexsorted to at least as many levels
if not self.is_lexsorted_for_tuple(tup):
raise KeyError('MultiIndex Slicing requires the index to be fully '
'lexsorted tuple len ({0}), lexsort depth '
'({1})'.format(len(tup), self.lexsort_depth))
raise UnsortedIndexError('MultiIndex Slicing requires the index '
'to be fully lexsorted tuple len ({0}), '
'lexsort depth ({1})'
.format(len(tup), self.lexsort_depth))

# indexer
# this is the list of all values that we want to select
Expand Down
18 changes: 17 additions & 1 deletion pandas/tests/indexes/test_multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from pandas import (DataFrame, date_range, period_range, MultiIndex, Index,
CategoricalIndex, compat)
from pandas.core.common import PerformanceWarning
from pandas.core.common import PerformanceWarning, UnsortedIndexError
from pandas.indexes.base import InvalidIndexError
from pandas.compat import range, lrange, u, PY3, long, lzip

Expand Down Expand Up @@ -2535,3 +2535,19 @@ def test_dropna(self):
msg = "invalid how option: xxx"
with tm.assertRaisesRegexp(ValueError, msg):
idx.dropna(how='xxx')

def test_unsortedindex(self):
# GH 11897
mi = pd.MultiIndex.from_tuples([('z', 'a'), ('x', 'a'), ('y', 'b'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to find places where currently a KeyError is raised and change them in the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now done that. My way of doing that was to make UnsortedIndexError a subclass of ValueError, run nosetests to see where it complained, and then change those tests accordingly. Then I switched UnsortedIndexError back to being a subclass of KeyError.

('x', 'b'), ('y', 'a'), ('z', 'b')],
names=['one', 'two'])
df = pd.DataFrame([[i, 10 * i] for i in lrange(6)], index=mi,
columns=['one', 'two'])

with assertRaises(UnsortedIndexError):
df.loc(axis=0)['z', :]
df.sort_index(inplace=True)
self.assertEqual(len(df.loc(axis=0)['z', :]), 2)

with assertRaises(KeyError):
df.loc(axis=0)['q', :]
14 changes: 9 additions & 5 deletions pandas/tests/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
MultiIndex, Timestamp, Timedelta)
from pandas.formats.printing import pprint_thing
from pandas import concat
from pandas.core.common import PerformanceWarning
from pandas.core.common import PerformanceWarning, UnsortedIndexError

import pandas.util.testing as tm
from pandas import date_range
Expand Down Expand Up @@ -2230,7 +2230,7 @@ def f():
df = df.sortlevel(level=1, axis=0)
self.assertEqual(df.index.lexsort_depth, 0)
with tm.assertRaisesRegexp(
KeyError,
UnsortedIndexError,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm only these 2 occurences of the original KeyError.....?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Via my methodology described above, that's all I found.

'MultiIndex Slicing requires the index to be fully '
r'lexsorted tuple len \(2\), lexsort depth \(0\)'):
df.loc[(slice(None), df.loc[:, ('a', 'bar')] > 5), :]
Expand Down Expand Up @@ -2417,7 +2417,7 @@ def test_per_axis_per_level_doc_examples(self):
def f():
df.loc['A1', (slice(None), 'foo')]

self.assertRaises(KeyError, f)
self.assertRaises(UnsortedIndexError, f)
df = df.sortlevel(axis=1)

# slicing
Expand Down Expand Up @@ -3480,8 +3480,12 @@ def test_iloc_mask(self):
('index', '.loc'): '0b11',
('index', '.iloc'): ('iLocation based boolean indexing '
'cannot use an indexable as a mask'),
('locs', ''): 'Unalignable boolean Series key provided',
('locs', '.loc'): 'Unalignable boolean Series key provided',
('locs', ''): 'Unalignable boolean Series provided as indexer '
'(index of the boolean Series and of the indexed '
'object do not match',
('locs', '.loc'): 'Unalignable boolean Series provided as indexer '
'(index of the boolean Series and of the indexed '
'object do not match',
('locs', '.iloc'): ('iLocation based boolean indexing on an '
'integer type is not available'),
}
Expand Down