ENH: Introduce UnsortedIndexError GH11897 #14762

Dr-Irv · 2016-11-29T15:18:34Z

tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

codecov-io · 2016-11-29T19:14:55Z

Current coverage is 85.28% (diff: 100%)

Merging #14762 into master will increase coverage by <.01%

@@             master     #14762   diff @@
==========================================
  Files           144        144          
  Lines         50968      50970     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43466      43469     +3   
+ Misses         7502       7501     -1   
  Partials          0          0

Powered by Codecov. Last update 2466ecb...c4e7a2a

jreback · 2016-11-29T21:48:00Z

doc/source/whatsnew/v0.20.0.txt

@@ -41,6 +41,9 @@ Other enhancements

 - ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

+- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing into an
+ unsorted index (:issue:'11897`)


specificaly on MultiIndex? you are closing 2 issues, is there a second note (or you can list 2 for this one if applicable)

OK, I've fixed that. When I did the second issue, I forgot to update whatsnew.

jreback · 2016-11-29T21:48:30Z

pandas/core/indexing.py

@@ -1814,7 +1814,7 @@ def check_bool_indexer(ax, key):
        result = result.reindex(ax)
        mask = isnull(result._values)
        if mask.any():
-            raise IndexingError('Unalignable boolean Series key provided')
+            raise IndexingError('Unalignable labels in boolean Series index')


add something like "{}".format(type(key)) (e.g. make more useful)

Added '{}'.format(type(key.index)) to print the index instead

jreback · 2016-11-29T21:48:48Z

pandas/indexes/multi.py

@@ -45,6 +45,10 @@
 import pandas.indexes.base as ibase


+class UnsortedIndexError(KeyError):


more to pandas.core.common

jreback · 2016-11-29T21:49:19Z

pandas/tests/indexes/test_multi.py

+
+    def test_unsortedindex(self):
+        # GH 11897
+        mi = pd.MultiIndex.from_tuples([('z', 'a'), ('x', 'a'), ('y', 'b'),


we want to find places where currently a KeyError is raised and change them in the tests.

I've now done that. My way of doing that was to make UnsortedIndexError a subclass of ValueError, run nosetests to see where it complained, and then change those tests accordingly. Then I switched UnsortedIndexError back to being a subclass of KeyError.

jreback · 2016-12-01T15:26:44Z

pandas/core/common.py

@@ -97,6 +97,10 @@ class UnsupportedFunctionCall(ValueError):
    pass


+class UnsortedIndexError(KeyError):
+    pass


can you add a doc-string

jreback · 2016-12-01T15:27:20Z

pandas/tests/indexing/test_indexing.py

@@ -2230,7 +2230,7 @@ def f():
        df = df.sortlevel(level=1, axis=0)
        self.assertEqual(df.index.lexsort_depth, 0)
        with tm.assertRaisesRegexp(
-                KeyError,
+                UnsortedIndexError,


hmm only these 2 occurences of the original KeyError.....?

Via my methodology described above, that's all I found.

jreback · 2016-12-01T15:28:20Z

doc/source/whatsnew/v0.20.0.txt

@@ -41,6 +41,11 @@ Other enhancements

 - ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

+- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing into an


I would give a mini-example of this (or put a ref-link back to the docs). speaking of the docs, I think we need some updating to say what happens when this is raised (e.g. its prob still KeyError )

I put a link to the doc section, and fixed that part of the docs.

jreback · 2016-12-01T15:28:47Z

doc/source/whatsnew/v0.20.0.txt

+ unsorted MultiIndex (:issue:'11897`)
+
+- Change message when indexing via a boolean ``Series`` that has an incompatible index (:issue:`14491`)
+


this is not clear what you are fixing / changing. be a bit more verbose

I've done that in my latest commit.

jreback · 2016-12-04T19:33:17Z

doc/source/whatsnew.rst

@@ -18,6 +18,8 @@ What's New

 These are new features and improvements of note in each release.

+.. include:: whatsnew/v0.20.0.txt


pls revert this. we will update this after 0.19.2 comes out.

jreback · 2016-12-04T19:33:42Z

doc/source/whatsnew/v0.20.0.txt

+  of sorting or an incorrect key. See :ref:`here <advanced.unsorted>`
+
+- Change error message text corresponding to error raised when indexing via a
+  boolean ``Series`` that has an incompatible index (:issue:`14491`)


this is a bit confusing

I personally don't think it is per se needed to have a whatsnew notice about this. It is not really a new feature, just a minor rewording of an existing correct error message.

I shortened the item. I think it should be in whatsnew to refer back to the issue. But you guys can decide!

jreback · 2016-12-04T19:33:49Z

doc/source/whatsnew/v0.20.0.txt

@@ -41,6 +41,13 @@ Other enhancements

 - ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

+- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing/slicing into an


thrown -> raise

jreback · 2016-12-04T19:34:06Z

pandas/core/common.py

+class UnsortedIndexError(KeyError):
+    """ This error is raised when attempting to get a slice of a MultiIndex
+    and the index has not been lexsorted.
+    """


add a versiononadded tag here

"This error is raised when" -> "Error raised when"

Can you also add that it is a subclass of a KeyError?

jreback · 2016-12-04T19:35:17Z

@Dr-Irv lgtm. just a couple of minor points. ping when green.

@jorisvandenbossche

jorisvandenbossche

Looks good.

Main remaining question: since we announce this as useful to catch this as a different error as KeyError, we need a public API for this.
Currently, it is located in pandas.core.common. Is that where we want to expose this?

jorisvandenbossche · 2016-12-04T20:44:57Z

doc/source/whatsnew/v0.20.0.txt

+  of sorting or an incorrect key. See :ref:`here <advanced.unsorted>`
+
+- Change error message text corresponding to error raised when indexing via a
+  boolean ``Series`` that has an incompatible index (:issue:`14491`)


I personally don't think it is per se needed to have a whatsnew notice about this. It is not really a new feature, just a minor rewording of an existing correct error message.

jorisvandenbossche · 2016-12-04T20:46:21Z

pandas/core/common.py

+class UnsortedIndexError(KeyError):
+    """ This error is raised when attempting to get a slice of a MultiIndex
+    and the index has not been lexsorted.
+    """


"This error is raised when" -> "Error raised when"

Can you also add that it is a subclass of a KeyError?

jorisvandenbossche · 2016-12-04T20:48:18Z

pandas/core/indexing.py

@@ -1814,7 +1814,8 @@ def check_bool_indexer(ax, key):
        result = result.reindex(ax)
        mask = isnull(result._values)
        if mask.any():
-            raise IndexingError('Unalignable boolean Series key provided')
+            raise IndexingError('Unalignable labels in boolean Series index '
+                                '{}'.format(key.index))


I am not sure if this will give a nice result (adding the key's index in the error message, the rewording is certainly an improvement!)

Can you show an example and its resulting error message?

move this to the Other API Changes section

@jorisvandenbossche Where do you want me to put the example? In the whatsnew document?

@jreback I've moved the text to the Other API Changes section.

No, sorry, I just meant here in the comments. To see how it looks, as I am not sure if I find it a good idea to include the full index.

@jorisvandenbossche Here is an example:

Unalignable labels in boolean Series index Int64Index([3, 2, 1, 0], dtype='int64')

Note that I added the key's index because of a suggestion @jreback made earlier.

I think the suggestion of @jreback what to include type(key.index) instead of key.index. But anyway, I personally think that this full index will obfuscate the error message a bit, as the index can be typically much longer.
Other proposal to make the message more clear:

Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match)

What do you think of that?

@jorisvandenbossche I've used your text. Committed change after doing a rebase. I think @jreback can merge now.

jreback · 2016-12-04T20:59:07Z

i think we can expose errors in

pandas.api.errors

but will need to do in another PR (need tests i think)

and a deprecation message if importerd
from core.common (which i think will work)

jorisvandenbossche · 2016-12-04T21:05:51Z

and a deprecation message if importerd

Well, there was a PR from gfyoung couple weeks ago about deprecating an exception, and turned out to be not that easy.
That's also the reason that I bring it up here, so we can do it right from the start. But for me it's OK to do in separate PR, as long as it is slated for 0.20

jreback · 2016-12-04T23:27:18Z

I think we can simply use pandas.util.depr_module with specific exceptions listed, no?

You basically deprecate them on first use, IOW simply importing them.

cc @gfyoung

gfyoung · 2016-12-04T23:42:16Z

@jreback: we had a similar issue with CParserError, and unfortunately, I don't think my depr_module applies in either case because it doesn't deprecate a class.

jreback · 2016-12-04T23:44:01Z

@gfyoung how so? CParserError a class. In fact I would assert it doesn't matter what it deprecates, all it cares about is getting a name from a namespace.

gfyoung · 2016-12-05T00:31:03Z

@jreback : But you can't really deprecate KeyError in this case, so I'm actually not sure what you're trying to achieve with depr_module in the first place.

jreback · 2016-12-05T00:35:59Z

@gfyoung pls read the issue. I am going to deprecate the exceptions in pandas.core.common and move them to pandas.api.exceptions.

gfyoung · 2016-12-05T00:37:01Z

@jreback : That's not in the original issue IINM, but in case, sure that should work what you're describing then.

jreback · 2016-12-05T23:37:12Z

@Dr-Irv can you fix up and rebase.

Dr-Irv · 2016-12-06T16:50:07Z

@jreback Last commit is the rebase and fixes the conflicting file, so not sure what to do next

ERR: Change error message pandas-dev#14491

jorisvandenbossche · 2016-12-07T22:54:58Z

@Dr-Irv Can you just rebase? Then it should be good to merge

Dr-Irv · 2016-12-07T23:10:45Z

@jorisvandenbossche I think my last commit includes the rebase and error message change.

jorisvandenbossche · 2016-12-08T08:04:09Z

@Dr-Irv As you can see in the github interface, this cannot be merged due to conflicts (but it is possible this was only introduced since your last commit).

Dr-Irv · 2016-12-08T15:13:41Z

@jorisvandenbossche I'm confused. I can't find the conflicts it is reporting. I did the following:

git checkout master
git pull
git checkout <branch>
git merge master

git reported that everything was up to date. So there were no conflicts reported. I'm not sure what to do.

jorisvandenbossche · 2016-12-08T15:39:59Z

@Dr-Irv depending on your settings, I think git pull will fetch your latest 'origin' (your fork), where you want to fetch the latest upstream.
What I typically do is

git fetch upstream
git checkout <branch>
git rebase upstream/master

(but the last line can also be git merge upstream/master if you want, that doesn't matter too much)

jorisvandenbossche · 2016-12-08T23:09:45Z

@Dr-Irv There are still some linting errors (hence the failing travis, see at the bottom of this log: https://travis-ci.org/pandas-dev/pandas/jobs/182331460):

pandas/core/common.py:108:1: W293 blank line contains whitespace
pandas/core/common.py:111:1: E303 too many blank lines (3)
pandas/core/indexing.py:1817:80: E501 line too long (81 > 79 characters)
pandas/core/indexing.py:1818:80: E501 line too long (82 > 79 characters)
pandas/tests/indexing/test_indexing.py:3487:80: E501 line too long (80 > 79 characters)

ERR: Change error message pandas-dev#14491

Dr-Irv · 2016-12-09T14:59:16Z

@jorisvandenbossche I've now fixed those. For some reason flake8 --diff didn't pick up those errors.

Dr-Irv · 2016-12-09T16:24:35Z

@jorisvandenbossche So I resubmitted, but now the appveyor is failing, but looking at the logs, I can't figure out why.

jorisvandenbossche · 2016-12-09T16:36:19Z

Don't worry about appveyor, it's not related.

Merging, thanks a lot!

* commit 'v0.19.0-174-g81a2f79': (156 commits) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ENH: Introduce UnsortedIndexError GH11897 (pandas-dev#14762) ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753) DOC: Fix grammar and formatting typos (pandas-dev#14803) BLD: try new build credentials for pandas-docs TST: Test pivot with categorical data MAINT: Cleanup pandas/src/parser (pandas-dev#14740) ...

release 0.19.1 was from release branch * releases: (156 commits) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ENH: Introduce UnsortedIndexError GH11897 (pandas-dev#14762) ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753) DOC: Fix grammar and formatting typos (pandas-dev#14803) BLD: try new build credentials for pandas-docs TST: Test pivot with categorical data MAINT: Cleanup pandas/src/parser (pandas-dev#14740) ...

jreback added Error Reporting Incorrect or improved errors from pandas MultiIndex labels Nov 29, 2016

jreback reviewed Nov 29, 2016

View reviewed changes

jreback requested changes Nov 29, 2016

View reviewed changes

jreback reviewed Dec 1, 2016

View reviewed changes

jreback reviewed Dec 4, 2016

View reviewed changes

jreback added this to the 0.20.0 milestone Dec 4, 2016

jreback approved these changes Dec 4, 2016

View reviewed changes

jorisvandenbossche reviewed Dec 4, 2016

View reviewed changes

Dr-Irv force-pushed the KeyError branch from 891be85 to f33093b Compare December 6, 2016 16:38

ENH: Introduce UnsortedIndexError pandas-dev#11897

0947982

ERR: Change error message pandas-dev#14491

Dr-Irv force-pushed the KeyError branch from f33093b to 0947982 Compare December 7, 2016 17:13

jorisvandenbossche approved these changes Dec 7, 2016

View reviewed changes

Merge remote-tracking branch 'upstream/master' into KeyError

c4e7a2a

ENH: Introduce UnsortedIndexError pandas-dev#11897

76b6434

ERR: Change error message pandas-dev#14491

jorisvandenbossche merged commit 36bb8af into pandas-dev:master Dec 9, 2016

Dr-Irv deleted the KeyError branch December 9, 2016 16:40

ischurov pushed a commit to ischurov/pandas that referenced this pull request Dec 19, 2016

ENH: Introduce UnsortedIndexError GH11897 (pandas-dev#14762)

e28427a

		@@ -45,6 +45,10 @@
		import pandas.indexes.base as ibase


		class UnsortedIndexError(KeyError):

		@@ -41,6 +41,11 @@ Other enhancements

		- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

		- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing into an

		unsorted MultiIndex (:issue:'11897`)

		- Change message when indexing via a boolean ``Series`` that has an incompatible index (:issue:`14491`)

		@@ -18,6 +18,8 @@ What's New

		These are new features and improvements of note in each release.

		.. include:: whatsnew/v0.20.0.txt

		@@ -41,6 +41,13 @@ Other enhancements

		- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

		- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing/slicing into an

ENH: Introduce UnsortedIndexError GH11897 #14762

ENH: Introduce UnsortedIndexError GH11897 #14762

Conversation

Dr-Irv commented Nov 29, 2016 • edited by jreback Loading

codecov-io commented Nov 29, 2016 • edited Loading

Current coverage is 85.28% (diff: 100%)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 4, 2016

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 4, 2016

jorisvandenbossche commented Dec 4, 2016

jreback commented Dec 4, 2016

gfyoung commented Dec 4, 2016

jreback commented Dec 4, 2016

gfyoung commented Dec 5, 2016

jreback commented Dec 5, 2016

gfyoung commented Dec 5, 2016

jreback commented Dec 5, 2016

Dr-Irv commented Dec 6, 2016

jorisvandenbossche commented Dec 7, 2016

Dr-Irv commented Dec 7, 2016

jorisvandenbossche commented Dec 8, 2016

Dr-Irv commented Dec 8, 2016

jorisvandenbossche commented Dec 8, 2016 • edited Loading

jorisvandenbossche commented Dec 8, 2016

Dr-Irv commented Dec 9, 2016

Dr-Irv commented Dec 9, 2016

jorisvandenbossche commented Dec 9, 2016

Dr-Irv commented Nov 29, 2016 •

edited by jreback

Loading

codecov-io commented Nov 29, 2016 •

edited

Loading

jorisvandenbossche commented Dec 8, 2016 •

edited

Loading