Skip to content

ENH: Introduce UnsortedIndexError GH11897 #14762

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 9, 2016

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Nov 29, 2016

closes #11897
closes #14491

  • tests added / passed
  • passes git diff upstream/master | flake8 --diff
  • whatsnew entry

@codecov-io
Copy link

codecov-io commented Nov 29, 2016

Current coverage is 85.28% (diff: 100%)

Merging #14762 into master will increase coverage by <.01%

@@             master     #14762   diff @@
==========================================
  Files           144        144          
  Lines         50968      50970     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43466      43469     +3   
+ Misses         7502       7501     -1   
  Partials          0          0          

Powered by Codecov. Last update 2466ecb...c4e7a2a

@jreback jreback added Error Reporting Incorrect or improved errors from pandas MultiIndex labels Nov 29, 2016
@@ -41,6 +41,9 @@ Other enhancements

- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing into an
unsorted index (:issue:'11897`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specificaly on MultiIndex? you are closing 2 issues, is there a second note (or you can list 2 for this one if applicable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've fixed that. When I did the second issue, I forgot to update whatsnew.

@@ -1814,7 +1814,7 @@ def check_bool_indexer(ax, key):
result = result.reindex(ax)
mask = isnull(result._values)
if mask.any():
raise IndexingError('Unalignable boolean Series key provided')
raise IndexingError('Unalignable labels in boolean Series index')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add something like "{}".format(type(key)) (e.g. make more useful)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added '{}'.format(type(key.index)) to print the index instead

@@ -45,6 +45,10 @@
import pandas.indexes.base as ibase


class UnsortedIndexError(KeyError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more to pandas.core.common

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


def test_unsortedindex(self):
# GH 11897
mi = pd.MultiIndex.from_tuples([('z', 'a'), ('x', 'a'), ('y', 'b'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to find places where currently a KeyError is raised and change them in the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now done that. My way of doing that was to make UnsortedIndexError a subclass of ValueError, run nosetests to see where it complained, and then change those tests accordingly. Then I switched UnsortedIndexError back to being a subclass of KeyError.

@@ -97,6 +97,10 @@ class UnsupportedFunctionCall(ValueError):
pass


class UnsortedIndexError(KeyError):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -2230,7 +2230,7 @@ def f():
df = df.sortlevel(level=1, axis=0)
self.assertEqual(df.index.lexsort_depth, 0)
with tm.assertRaisesRegexp(
KeyError,
UnsortedIndexError,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm only these 2 occurences of the original KeyError.....?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Via my methodology described above, that's all I found.

@@ -41,6 +41,11 @@ Other enhancements

- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing into an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would give a mini-example of this (or put a ref-link back to the docs). speaking of the docs, I think we need some updating to say what happens when this is raised (e.g. its prob still KeyError )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a link to the doc section, and fixed that part of the docs.

unsorted MultiIndex (:issue:'11897`)

- Change message when indexing via a boolean ``Series`` that has an incompatible index (:issue:`14491`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not clear what you are fixing / changing. be a bit more verbose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done that in my latest commit.

@@ -18,6 +18,8 @@ What's New

These are new features and improvements of note in each release.

.. include:: whatsnew/v0.20.0.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls revert this. we will update this after 0.19.2 comes out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

of sorting or an incorrect key. See :ref:`here <advanced.unsorted>`

- Change error message text corresponding to error raised when indexing via a
boolean ``Series`` that has an incompatible index (:issue:`14491`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit confusing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't think it is per se needed to have a whatsnew notice about this. It is not really a new feature, just a minor rewording of an existing correct error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shortened the item. I think it should be in whatsnew to refer back to the issue. But you guys can decide!

@@ -41,6 +41,13 @@ Other enhancements

- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)

- New ``UnsortedIndexError`` (subclass of ``KeyError``) thrown when indexing/slicing into an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thrown -> raise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

class UnsortedIndexError(KeyError):
""" This error is raised when attempting to get a slice of a MultiIndex
and the index has not been lexsorted.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versiononadded tag here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This error is raised when" -> "Error raised when"

Can you also add that it is a subclass of a KeyError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback jreback added this to the 0.20.0 milestone Dec 4, 2016
@jreback
Copy link
Contributor

jreback commented Dec 4, 2016

@Dr-Irv lgtm. just a couple of minor points. ping when green.

@jorisvandenbossche

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Main remaining question: since we announce this as useful to catch this as a different error as KeyError, we need a public API for this.
Currently, it is located in pandas.core.common. Is that where we want to expose this?

of sorting or an incorrect key. See :ref:`here <advanced.unsorted>`

- Change error message text corresponding to error raised when indexing via a
boolean ``Series`` that has an incompatible index (:issue:`14491`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't think it is per se needed to have a whatsnew notice about this. It is not really a new feature, just a minor rewording of an existing correct error message.

class UnsortedIndexError(KeyError):
""" This error is raised when attempting to get a slice of a MultiIndex
and the index has not been lexsorted.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This error is raised when" -> "Error raised when"

Can you also add that it is a subclass of a KeyError?

@@ -1814,7 +1814,8 @@ def check_bool_indexer(ax, key):
result = result.reindex(ax)
mask = isnull(result._values)
if mask.any():
raise IndexingError('Unalignable boolean Series key provided')
raise IndexingError('Unalignable labels in boolean Series index '
'{}'.format(key.index))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this will give a nice result (adding the key's index in the error message, the rewording is certainly an improvement!)

Can you show an example and its resulting error message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to the Other API Changes section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Where do you want me to put the example? In the whatsnew document?

@jreback I've moved the text to the Other API Changes section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, sorry, I just meant here in the comments. To see how it looks, as I am not sure if I find it a good idea to include the full index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Here is an example:

Unalignable labels in boolean Series index Int64Index([3, 2, 1, 0], dtype='int64')

Note that I added the key's index because of a suggestion @jreback made earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the suggestion of @jreback what to include type(key.index) instead of key.index. But anyway, I personally think that this full index will obfuscate the error message a bit, as the index can be typically much longer.
Other proposal to make the message more clear:

Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match)

What do you think of that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche I've used your text. Committed change after doing a rebase. I think @jreback can merge now.

@jreback
Copy link
Contributor

jreback commented Dec 4, 2016

i think we can expose errors in

pandas.api.errors

but will need to do in another PR (need tests i think)

and a deprecation message if importerd
from core.common (which i think will work)

@jorisvandenbossche
Copy link
Member

and a deprecation message if importerd

Well, there was a PR from gfyoung couple weeks ago about deprecating an exception, and turned out to be not that easy.
That's also the reason that I bring it up here, so we can do it right from the start. But for me it's OK to do in separate PR, as long as it is slated for 0.20

@jreback
Copy link
Contributor

jreback commented Dec 4, 2016

I think we can simply use pandas.util.depr_module with specific exceptions listed, no?

You basically deprecate them on first use, IOW simply importing them.

cc @gfyoung

@gfyoung
Copy link
Member

gfyoung commented Dec 4, 2016

@jreback: we had a similar issue with CParserError, and unfortunately, I don't think my depr_module applies in either case because it doesn't deprecate a class.

@jreback
Copy link
Contributor

jreback commented Dec 4, 2016

@gfyoung how so? CParserError a class. In fact I would assert it doesn't matter what it deprecates, all it cares about is getting a name from a namespace.

@gfyoung
Copy link
Member

gfyoung commented Dec 5, 2016

@jreback : But you can't really deprecate KeyError in this case, so I'm actually not sure what you're trying to achieve with depr_module in the first place.

@jreback
Copy link
Contributor

jreback commented Dec 5, 2016

@gfyoung pls read the issue. I am going to deprecate the exceptions in pandas.core.common and move them to pandas.api.exceptions.

@gfyoung
Copy link
Member

gfyoung commented Dec 5, 2016

@jreback : That's not in the original issue IINM, but in case, sure that should work what you're describing then.

@jreback
Copy link
Contributor

jreback commented Dec 5, 2016

@Dr-Irv can you fix up and rebase.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Dec 6, 2016

@jreback Last commit is the rebase and fixes the conflicting file, so not sure what to do next

@jorisvandenbossche
Copy link
Member

@Dr-Irv Can you just rebase? Then it should be good to merge

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Dec 7, 2016

@jorisvandenbossche I think my last commit includes the rebase and error message change.

@jorisvandenbossche
Copy link
Member

@Dr-Irv As you can see in the github interface, this cannot be merged due to conflicts (but it is possible this was only introduced since your last commit).

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Dec 8, 2016

@jorisvandenbossche I'm confused. I can't find the conflicts it is reporting. I did the following:

git checkout master
git pull
git checkout <branch>
git merge master

git reported that everything was up to date. So there were no conflicts reported. I'm not sure what to do.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 8, 2016

@Dr-Irv depending on your settings, I think git pull will fetch your latest 'origin' (your fork), where you want to fetch the latest upstream.
What I typically do is

git fetch upstream
git checkout <branch>
git rebase upstream/master

(but the last line can also be git merge upstream/master if you want, that doesn't matter too much)

@jorisvandenbossche
Copy link
Member

@Dr-Irv There are still some linting errors (hence the failing travis, see at the bottom of this log: https://travis-ci.org/pandas-dev/pandas/jobs/182331460):

pandas/core/common.py:108:1: W293 blank line contains whitespace
pandas/core/common.py:111:1: E303 too many blank lines (3)
pandas/core/indexing.py:1817:80: E501 line too long (81 > 79 characters)
pandas/core/indexing.py:1818:80: E501 line too long (82 > 79 characters)
pandas/tests/indexing/test_indexing.py:3487:80: E501 line too long (80 > 79 characters)

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Dec 9, 2016

@jorisvandenbossche I've now fixed those. For some reason flake8 --diff didn't pick up those errors.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Dec 9, 2016

@jorisvandenbossche So I resubmitted, but now the appveyor is failing, but looking at the logs, I can't figure out why.

@jorisvandenbossche
Copy link
Member

Don't worry about appveyor, it's not related.

Merging, thanks a lot!

@jorisvandenbossche jorisvandenbossche merged commit 36bb8af into pandas-dev:master Dec 9, 2016
@Dr-Irv Dr-Irv deleted the KeyError branch December 9, 2016 16:40
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Dec 12, 2016
* commit 'v0.19.0-174-g81a2f79': (156 commits)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ENH: Introduce UnsortedIndexError  GH11897 (pandas-dev#14762)
  ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753)
  DOC: Fix grammar and formatting typos (pandas-dev#14803)
  BLD: try new build credentials for pandas-docs
  TST: Test pivot with categorical data
  MAINT: Cleanup pandas/src/parser (pandas-dev#14740)
  ...
yarikoptic added a commit to neurodebian/pandas that referenced this pull request Dec 12, 2016
release 0.19.1 was from release branch

* releases: (156 commits)
  BLD: escape GH_TOKEN in build_docs
  TST: Correct results with np.size and crosstab (pandas-dev#4003) (pandas-dev#14755)
  Frame benchmarking sum instead of mean (pandas-dev#14824)
  CLN: lint of test_base.py
  BUG: Allow TZ-aware DatetimeIndex in merge_asof() (pandas-dev#14844)
  BUG: GH11847 Unstack with mixed dtypes coerces everything to object
  TST: skip testing on windows for specific formatting which sometimes hangs (pandas-dev#14851)
  BLD: try new gh token for pandas-docs
  CLN/PERF: clean-up of the benchmarks (pandas-dev#14099)
  ENH: add timedelta as valid type for interpolate with method='time' (pandas-dev#14799)
  DOC: add section on groupby().rolling/expanding/resample (pandas-dev#14801)
  TST: add test to confirm GH14606 (specify category dtype for empty) (pandas-dev#14752)
  BLD: use org name in build-docs.sh
  BF(TST): use = (native) instead of < (little endian) for target data types (pandas-dev#14832)
  ENH: Introduce UnsortedIndexError  GH11897 (pandas-dev#14762)
  ENH: Add the ability to have a separate title for each subplot when plotting (pandas-dev#14753)
  DOC: Fix grammar and formatting typos (pandas-dev#14803)
  BLD: try new build credentials for pandas-docs
  TST: Test pivot with categorical data
  MAINT: Cleanup pandas/src/parser (pandas-dev#14740)
  ...
ischurov pushed a commit to ischurov/pandas that referenced this pull request Dec 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas MultiIndex
Projects
None yet
5 participants