Skip to content

BUG: GH17464 MultiIndex now raises an error when levels aren't unique, tests changed #17971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 46 commits into from
Dec 3, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
a9a5177
BUG: GH17464 Add error checking for duplicate levels in MultiIndex
cmazzullo Oct 24, 2017
91c8388
Remove duplicate levels from `test_is_`
cmazzullo Oct 25, 2017
392ce8a
Remove duplicate levels from `test_level_setting_resets_attributes`
cmazzullo Oct 25, 2017
3c0812e
Remove duplicate levels from `test_frame_describe_multikey`
cmazzullo Oct 25, 2017
0983453
Add comments about change
cmazzullo Oct 25, 2017
b39421f
Added comments changes w/ bug number
cmazzullo Oct 25, 2017
5771426
PEP8 compliance
cmazzullo Oct 25, 2017
48f9429
whatsnew entry
cmazzullo Oct 25, 2017
9ffc8ad
Remove duplicate levels from `test_is_`
cmazzullo Oct 25, 2017
e70a2be
Remove duplicate levels from `test_level_setting_resets_attributes`
cmazzullo Oct 25, 2017
c28d3df
Remove duplicate levels from `test_frame_describe_multikey`
cmazzullo Oct 25, 2017
03f2da8
Add comments about change
cmazzullo Oct 25, 2017
1d45ab6
Added comments changes w/ bug number
cmazzullo Oct 25, 2017
015af48
PEP8 compliance
cmazzullo Oct 25, 2017
9dc7eb5
whatsnew entry
cmazzullo Oct 25, 2017
9aa2bcd
Whatsnew backticks
cmazzullo Oct 25, 2017
e52460e
whatsnew merging
cmazzullo Oct 25, 2017
48d509d
Removed comments about this issue from other tests
cmazzullo Oct 27, 2017
d75e1de
Added test to make sure a ValueError is thrown
cmazzullo Oct 27, 2017
0943d19
PEP8 compliance
cmazzullo Oct 27, 2017
6f2efc6
move whatsnew to 0.22.0
jreback Oct 28, 2017
840fe56
Remove duplicate levels from `test_is_`
cmazzullo Oct 25, 2017
137bc16
Remove duplicate levels from `test_frame_describe_multikey`
cmazzullo Oct 25, 2017
0129ee7
Add comments about change
cmazzullo Oct 25, 2017
8b400dc
Added comments changes w/ bug number
cmazzullo Oct 25, 2017
0a8e9f2
PEP8 compliance
cmazzullo Oct 25, 2017
cc7ebc7
whatsnew entry
cmazzullo Oct 25, 2017
b02114f
Remove duplicate levels from `test_is_`
cmazzullo Oct 25, 2017
b56eca0
Remove duplicate levels from `test_level_setting_resets_attributes`
cmazzullo Oct 25, 2017
9f179e6
Remove duplicate levels from `test_frame_describe_multikey`
cmazzullo Oct 25, 2017
2af9aba
Add comments about change
cmazzullo Oct 25, 2017
3e56aba
Added comments changes w/ bug number
cmazzullo Oct 25, 2017
85d6379
PEP8 compliance
cmazzullo Oct 25, 2017
49b731d
whatsnew entry
cmazzullo Oct 25, 2017
ec4f971
Whatsnew backticks
cmazzullo Oct 25, 2017
2684855
whatsnew merging
cmazzullo Oct 25, 2017
c36c236
Removed comments about this issue from other tests
cmazzullo Oct 27, 2017
2b3f4d4
Added test to make sure a ValueError is thrown
cmazzullo Oct 27, 2017
386daaf
move whatsnew to 0.22.0
jreback Oct 28, 2017
c169645
Updated `test_frame_describe_multikey` to remove duplicate MultiIndex…
cmazzullo Nov 29, 2017
073e629
Fixed linting issue
cmazzullo Dec 1, 2017
869157d
whatsnew changes
cmazzullo Dec 2, 2017
44e4552
Kwargs in error message
cmazzullo Dec 2, 2017
297216b
Removed unnecessary comment
cmazzullo Dec 2, 2017
fead79f
Added blank line
cmazzullo Dec 2, 2017
703ff1e
Got rid of duplicated tests
cmazzullo Dec 2, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.22.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ Other API Changes
- Inserting missing values into indexes will work for all types of indexes and automatically insert the correct type of missing value (``NaN``, ``NaT``, etc.) regardless of the type passed in (:issue:`18295`)
- Restricted ``DateOffset`` keyword arguments. Previously, ``DateOffset`` subclasses allowed arbitrary keyword arguments which could lead to unexpected behavior. Now, only valid arguments will be accepted. (:issue:`17176`, :issue:`18226`).
- :func:`DataFrame.from_items` provides a more informative error message when passed scalar values (:issue:`17312`)
- When created with duplicate labels, ``MultiIndex`` now raises a ``ValueError``. (:issue:`17464`)

.. _whatsnew_0220.deprecations:

Expand Down
9 changes: 7 additions & 2 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,8 +177,8 @@ def _verify_integrity(self, labels=None, levels=None):
Raises
------
ValueError
* if length of levels and labels don't match or any label would
exceed level bounds
If length of levels and labels don't match, if any label would
exceed level bounds, or there are any duplicate levels.
"""
# NOTE: Currently does not check, among other things, that cached
# nlevels matches nor that sortorder matches actually sortorder.
Expand All @@ -198,6 +198,11 @@ def _verify_integrity(self, labels=None, levels=None):
" level (%d). NOTE: this index is in an"
" inconsistent state" % (i, label.max(),
len(level)))
if not level.is_unique:
raise ValueError("Level values must be unique: {values} on "
"level {level}".format(
values=[value for value in level],
level=i))

@property
def levels(self):
Expand Down
13 changes: 7 additions & 6 deletions pandas/tests/groupby/test_functional.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,10 @@ def test_frame_describe_multikey(self):
desc_groups = []
for col in self.tsframe:
group = grouped[col].describe()
group_col = pd.MultiIndex([[col] * len(group.columns),
group.columns],
[[0] * len(group.columns),
range(len(group.columns))])
# GH 17464 - Remove duplicate MultiIndex levels
group_col = pd.MultiIndex(
levels=[[col], group.columns],
labels=[[0] * len(group.columns), range(len(group.columns))])
group = pd.DataFrame(group.values,
columns=group_col,
index=group.index)
Expand All @@ -67,8 +67,9 @@ def test_frame_describe_multikey(self):
'C': 1, 'D': 1}, axis=1)
result = groupedT.describe()
expected = self.tsframe.describe().T
expected.index = pd.MultiIndex([[0, 0, 1, 1], expected.index],
[range(4), range(len(expected.index))])
expected.index = pd.MultiIndex(
levels=[[0, 1], expected.index],
labels=[[0, 0, 1, 1], range(len(expected.index))])
tm.assert_frame_equal(result, expected)

def test_frame_describe_tupleindex(self):
Expand Down
23 changes: 18 additions & 5 deletions pandas/tests/indexes/test_multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -1618,7 +1618,9 @@ def test_is_(self):
# shouldn't change
assert mi2.is_(mi)
mi4 = mi3.view()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an explicit test for the issue, e.g that you are raising (use the example from the original issue).

mi4.set_levels([[1 for _ in range(10)], lrange(10)], inplace=True)

# GH 17464 - Remove duplicate MultiIndex levels
mi4.set_levels([lrange(10), lrange(10)], inplace=True)
assert not mi4.is_(mi3)
mi5 = mi.view()
mi5.set_levels(mi5.levels, inplace=True)
Expand Down Expand Up @@ -2450,13 +2452,11 @@ def test_isna_behavior(self):
pd.isna(self.index)

def test_level_setting_resets_attributes(self):
ind = MultiIndex.from_arrays([
ind = pd.MultiIndex.from_arrays([
['A', 'A', 'B', 'B', 'B'], [1, 2, 1, 2, 3]
])
assert ind.is_monotonic
ind.set_levels([['A', 'B', 'A', 'A', 'B'], [2, 1, 3, -2, 5]],
inplace=True)

ind.set_levels([['A', 'B'], [1, 3, 2]], inplace=True)
# if this fails, probably didn't reset the cache correctly.
assert not ind.is_monotonic

Expand Down Expand Up @@ -3083,3 +3083,16 @@ def test_million_record_attribute_error(self):
with tm.assert_raises_regex(AttributeError,
"'Series' object has no attribute 'foo'"):
df['a'].foo()

def test_duplicate_multiindex_labels(self):
# GH 17464
# Make sure that a MultiIndex with duplicate levels throws a ValueError
with pytest.raises(ValueError):
ind = pd.MultiIndex([['A'] * 10, range(10)], [[0] * 10, range(10)])

# And that using set_levels with duplicate levels fails
ind = MultiIndex.from_arrays([['A', 'A', 'B', 'B', 'B'],
[1, 2, 1, 2, 3]])
with pytest.raises(ValueError):
ind.set_levels([['A', 'B', 'A', 'A', 'B'], [2, 1, 3, -2, 5]],
inplace=True)