Skip to content

BUG: MultiIndex not dropping nan level and invalid code value #26408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 1, 2019
Merged
4 changes: 2 additions & 2 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -387,8 +387,8 @@ MultiIndex
^^^^^^^^^^

- Bug in which incorrect exception raised by :class:`Timedelta` when testing the membership of :class:`MultiIndex` (:issue:`24570`)
-
-
- Bug in having code value < -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make something like

Bug in :class:`Multindex` construction from levelsandcodesthat would incorrectly allowsNaNlevels

- Bug in not dropping nan level (:issue:`19387`)

I/O
^^^
Expand Down
12 changes: 12 additions & 0 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -243,10 +243,19 @@ def __new__(cls, levels=None, codes=None, sortorder=None, names=None,

if verify_integrity:
result._verify_integrity()

codes = [cls._reassign_na_codes(*it) for it in zip(levels, codes)]
result._set_codes(codes, validate=False)

if _set_identity:
result._reset_identity()

return result

@classmethod
def _reassign_na_codes(cls, level, code):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this _validate_codes, pls add a doc-string.

This should be an instance method (and not a classmethod)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method level changed and moved into validation method

return [-1 if x == -1 or isna(level[x]) else x for x in code]

def _verify_integrity(self, codes=None, levels=None):
"""

Expand Down Expand Up @@ -281,6 +290,9 @@ def _verify_integrity(self, codes=None, levels=None):
" level (%d). NOTE: this index is in an"
" inconsistent state" % (i, level_codes.max(),
len(level)))
if len(level_codes) and level_codes.min() < -1:
raise ValueError("On level %d, code value (%d) < -1" %
(i, level_codes.min()))
if not level.is_unique:
raise ValueError("Level values must be unique: {values} on "
"level {level}".format(
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/indexes/multi/test_constructor.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ def test_constructor_mismatched_codes_levels(idx):
length_error = (r"On level 0, code max \(3\) >= length of level \(1\)\."
" NOTE: this index is in an inconsistent state")
label_error = r"Unequal code lengths: \[4, 2\]"
code_value_error = (r"On level 0, code value \(-2\) < -1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the outer parentheses here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


# important to check that it's looking at the right thing.
with pytest.raises(ValueError, match=length_error):
Expand All @@ -83,6 +84,23 @@ def test_constructor_mismatched_codes_levels(idx):
with pytest.raises(ValueError, match=label_error):
idx.copy().set_codes([[0, 0, 0, 0], [0, 0]])

# code value smaller than -1
with pytest.raises(ValueError, match=code_value_error):
MultiIndex(levels=[['a'], ['b']], codes=[[0, -2], [0, 0]])


def test_na_levels():
tm.assert_index_equal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment with the issue number

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue number added

MultiIndex(levels=[[np.nan, None, pd.NaT, 128, 2]],
codes=[[0, -1, 1, 2, 3, 4]]),
MultiIndex(levels=[[np.nan, None, pd.NaT, 128, 2]],
codes=[[-1, -1, -1, -1, 3, 4]]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use
result=
expected=
tm.assert_index_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test cases refactored

tm.assert_index_equal(
MultiIndex(levels=[[np.nan, 's', pd.NaT, 128, None]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also verify if .set_levels and/or .set_codes is called. Can you add tests for each.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new test cases added

codes=[[0, -1, 1, 2, 3, 4]]),
MultiIndex(levels=[[np.nan, 's', pd.NaT, 128, None]],
codes=[[-1, -1, 1, -1, 3, -1]]))


def test_labels_deprecated(idx):
# GH23752
Expand Down