-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Introduce UnsortedIndexError GH11897 #14762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 85.28% (diff: 100%)@@ master #14762 diff @@
==========================================
Files 144 144
Lines 50968 50970 +2
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43466 43469 +3
+ Misses 7502 7501 -1
Partials 0 0
|
@@ -41,6 +41,9 @@ Other enhancements | |||
|
|||
- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`) | |||
|
|||
- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing into an | |||
unsorted index (:issue:'11897`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specificaly on MultiIndex? you are closing 2 issues, is there a second note (or you can list 2 for this one if applicable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've fixed that. When I did the second issue, I forgot to update whatsnew.
@@ -1814,7 +1814,7 @@ def check_bool_indexer(ax, key): | |||
result = result.reindex(ax) | |||
mask = isnull(result._values) | |||
if mask.any(): | |||
raise IndexingError('Unalignable boolean Series key provided') | |||
raise IndexingError('Unalignable labels in boolean Series index') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add something like "{}".format(type(key)) (e.g. make more useful)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added '{}'.format(type(key.index)) to print the index instead
@@ -45,6 +45,10 @@ | |||
import pandas.indexes.base as ibase | |||
|
|||
|
|||
class UnsortedIndexError(KeyError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more to pandas.core.common
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
def test_unsortedindex(self): | ||
# GH 11897 | ||
mi = pd.MultiIndex.from_tuples([('z', 'a'), ('x', 'a'), ('y', 'b'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to find places where currently a KeyError is raised and change them in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now done that. My way of doing that was to make UnsortedIndexError a subclass of ValueError, run nosetests to see where it complained, and then change those tests accordingly. Then I switched UnsortedIndexError back to being a subclass of KeyError.
@@ -97,6 +97,10 @@ class UnsupportedFunctionCall(ValueError): | |||
pass | |||
|
|||
|
|||
class UnsortedIndexError(KeyError): | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -2230,7 +2230,7 @@ def f(): | |||
df = df.sortlevel(level=1, axis=0) | |||
self.assertEqual(df.index.lexsort_depth, 0) | |||
with tm.assertRaisesRegexp( | |||
KeyError, | |||
UnsortedIndexError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm only these 2 occurences of the original KeyError.....?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Via my methodology described above, that's all I found.
@@ -41,6 +41,11 @@ Other enhancements | |||
|
|||
- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`) | |||
|
|||
- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing into an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would give a mini-example of this (or put a ref-link back to the docs). speaking of the docs, I think we need some updating to say what happens when this is raised (e.g. its prob still KeyError )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put a link to the doc section, and fixed that part of the docs.
unsorted MultiIndex (:issue:'11897`) | ||
|
||
- Change message when indexing via a boolean ``Series`` that has an incompatible index (:issue:`14491`) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not clear what you are fixing / changing. be a bit more verbose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done that in my latest commit.
@@ -18,6 +18,8 @@ What's New | |||
|
|||
These are new features and improvements of note in each release. | |||
|
|||
.. include:: whatsnew/v0.20.0.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls revert this. we will update this after 0.19.2 comes out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
of sorting or an incorrect key. See :ref:`here <advanced.unsorted>` | ||
|
||
- Change error message text corresponding to error raised when indexing via a | ||
boolean ``Series`` that has an incompatible index (:issue:`14491`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't think it is per se needed to have a whatsnew notice about this. It is not really a new feature, just a minor rewording of an existing correct error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shortened the item. I think it should be in whatsnew to refer back to the issue. But you guys can decide!
@@ -41,6 +41,13 @@ Other enhancements | |||
|
|||
- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`) | |||
|
|||
- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing/slicing into an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thrown -> raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
class UnsortedIndexError(KeyError): | ||
""" This error is raised when attempting to get a slice of a MultiIndex | ||
and the index has not been lexsorted. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a versiononadded tag here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This error is raised when" -> "Error raised when"
Can you also add that it is a subclass of a KeyError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@Dr-Irv lgtm. just a couple of minor points. ping when green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Main remaining question: since we announce this as useful to catch this as a different error as KeyError, we need a public API for this.
Currently, it is located in pandas.core.common
. Is that where we want to expose this?
of sorting or an incorrect key. See :ref:`here <advanced.unsorted>` | ||
|
||
- Change error message text corresponding to error raised when indexing via a | ||
boolean ``Series`` that has an incompatible index (:issue:`14491`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't think it is per se needed to have a whatsnew notice about this. It is not really a new feature, just a minor rewording of an existing correct error message.
class UnsortedIndexError(KeyError): | ||
""" This error is raised when attempting to get a slice of a MultiIndex | ||
and the index has not been lexsorted. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This error is raised when" -> "Error raised when"
Can you also add that it is a subclass of a KeyError?
@@ -1814,7 +1814,8 @@ def check_bool_indexer(ax, key): | |||
result = result.reindex(ax) | |||
mask = isnull(result._values) | |||
if mask.any(): | |||
raise IndexingError('Unalignable boolean Series key provided') | |||
raise IndexingError('Unalignable labels in boolean Series index ' | |||
'{}'.format(key.index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this will give a nice result (adding the key's index in the error message, the rewording is certainly an improvement!)
Can you show an example and its resulting error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to the Other API Changes section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche Where do you want me to put the example? In the whatsnew document?
@jreback I've moved the text to the Other API Changes section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, sorry, I just meant here in the comments. To see how it looks, as I am not sure if I find it a good idea to include the full index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche Here is an example:
Unalignable labels in boolean Series index Int64Index([3, 2, 1, 0], dtype='int64')
Note that I added the key's index because of a suggestion @jreback made earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the suggestion of @jreback what to include type(key.index)
instead of key.index
. But anyway, I personally think that this full index will obfuscate the error message a bit, as the index can be typically much longer.
Other proposal to make the message more clear:
Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match)
What do you think of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche I've used your text. Committed change after doing a rebase. I think @jreback can merge now.
i think we can expose errors in pandas.api.errors but will need to do in another PR (need tests i think) and a deprecation message if importerd |
Well, there was a PR from gfyoung couple weeks ago about deprecating an exception, and turned out to be not that easy. |
I think we can simply use You basically deprecate them on first use, IOW simply importing them. cc @gfyoung |
@jreback: we had a similar issue with CParserError, and unfortunately, I don't think my depr_module applies in either case because it doesn't deprecate a class. |
@gfyoung how so? |
@jreback : But you can't really deprecate |
@gfyoung pls read the issue. I am going to deprecate the exceptions in pandas.core.common and move them to pandas.api.exceptions. |
@jreback : That's not in the original issue IINM, but in case, sure that should work what you're describing then. |
@Dr-Irv can you fix up and rebase. |
@jreback Last commit is the rebase and fixes the conflicting file, so not sure what to do next |
ERR: Change error message pandas-dev#14491
@Dr-Irv Can you just rebase? Then it should be good to merge |
@jorisvandenbossche I think my last commit includes the rebase and error message change. |
@Dr-Irv As you can see in the github interface, this cannot be merged due to conflicts (but it is possible this was only introduced since your last commit). |
@jorisvandenbossche I'm confused. I can't find the conflicts it is reporting. I did the following:
git reported that everything was up to date. So there were no conflicts reported. I'm not sure what to do. |
@Dr-Irv depending on your settings, I think
(but the last line can also be |
@Dr-Irv There are still some linting errors (hence the failing travis, see at the bottom of this log: https://travis-ci.org/pandas-dev/pandas/jobs/182331460):
|
ERR: Change error message pandas-dev#14491
@jorisvandenbossche I've now fixed those. For some reason flake8 --diff didn't pick up those errors. |
@jorisvandenbossche So I resubmitted, but now the appveyor is failing, but looking at the logs, I can't figure out why. |
Don't worry about appveyor, it's not related. Merging, thanks a lot! |
* commit 'v0.19.0-174-g81a2f79': (156 commits) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ENH: Introduce UnsortedIndexError GH11897 (pandas-dev#14762) ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753) DOC: Fix grammar and formatting typos (pandas-dev#14803) BLD: try new build credentials for pandas-docs TST: Test pivot with categorical data MAINT: Cleanup pandas/src/parser (pandas-dev#14740) ...
release 0.19.1 was from release branch * releases: (156 commits) BLD: escape GH_TOKEN in build_docs TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755) Frame benchmarking sum instead of mean (pandas-dev#14824) CLN: lint of test_base.py BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844) BUG: GH11847 Unstack with mixed dtypes coerces everything to object TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851) BLD: try new gh token for pandas-docs CLN/PERF: clean-up of the benchmarks (pandas-dev#14099) ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799) DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801) TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752) BLD: use org name in build-docs.sh BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832) ENH: Introduce UnsortedIndexError GH11897 (pandas-dev#14762) ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753) DOC: Fix grammar and formatting typos (pandas-dev#14803) BLD: try new build credentials for pandas-docs TST: Test pivot with categorical data MAINT: Cleanup pandas/src/parser (pandas-dev#14740) ...
closes #11897
closes #14491
git diff upstream/master | flake8 --diff