Skip to content

BUG: .reset_index() should raise with an invalid level name (GH20925) #21016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
May 18, 2018
Merged

BUG: .reset_index() should raise with an invalid level name (GH20925) #21016

merged 17 commits into from
May 18, 2018

Conversation

KalyanGokhale
Copy link
Contributor

@KalyanGokhale KalyanGokhale commented May 12, 2018

#20925 Raises appropriate error for Series.reset_index(level_name, drop=True) when index is flat and an invalid level is supplied

@codecov
Copy link

codecov bot commented May 12, 2018

Codecov Report

Merging #21016 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21016      +/-   ##
==========================================
+ Coverage   91.83%   91.83%   +<.01%     
==========================================
  Files         153      153              
  Lines       49497    49498       +1     
==========================================
+ Hits        45456    45458       +2     
+ Misses       4041     4040       -1
Flag Coverage Δ
#multiple 90.23% <100%> (ø) ⬆️
#single 41.88% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 94.12% <100%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d623ffd...c9afed3. Read the comment docs.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 12, 2018

Could you add a test that the correct error and error message is raised? Use tm.assert_raises_regex

@pep8speaks
Copy link

pep8speaks commented May 12, 2018

Hello @KalyanGokhale! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 18, 2018 at 01:27 Hours UTC

@KalyanGokhale
Copy link
Contributor Author

KalyanGokhale commented May 12, 2018

@TomAugspurger have added the test - please tell if any further edits are needed.

Copy link
Member

@toobaz toobaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My test case was wrong... removing the set() will maybe fix the correct one?

with tm.assert_raises_regex(KeyError, 'not found'):
s.reset_index('wrong', drop=True)
# Data for Test Case 4
s = pd.Series(range(4), name='valid')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and I mean: my testcase) doesn't make much sense, as we are giving a name to s itself, not to its index. I should have written

s = pd.Series(range(4), index=pd.RangeIndex(4, name='valid'))

Notice that if you replace with the above, the following doesn't raise any more. But it might be a bug in MultiIndex.droplevel, in which case OK to postpone.

Copy link
Contributor Author

@KalyanGokhale KalyanGokhale May 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will delete this particular test case now

if not isinstance(level, (tuple, list)):
level = [level]
level = set(level)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

level = set(level) is not required now. Did not remove it after combining two separate blocks post-testing. Will delete it.

if len(level) < len(self.index.levels):
new_index = self.index.droplevel(level)
if isinstance(self.index, MultiIndex):
if (len(level) < len(self.index.levels)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(self.index.levels) can be replaced with self.index.nlevels (and then I guess you'll be able to drop the MultiIndex test above). But is this test actually needed? (It's not a rhetorical question)

Copy link
Contributor Author

@KalyanGokhale KalyanGokhale May 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.

Yes replacing len(self.index.levels) with self.index.nlevels helps and MultiIndex test can then be removed. However, have retained the MultiIndex test to avoid any unforeseen outcomes - retaining it also produces the same outcomes (at least in the test cases checked).

len(level) < self.index.nlevels test is needed, in case this is dropped (assuming MultiIndex test is also dropped) then following errors are observed for these cases:
Test Data
arrays = [np.array(['bar', 'bar', 'baz', 'baz']), np.array(['one', 'two', 'one', 'two'])]
s = pd.Series(range(4), name='foo', index=pd.MultiIndex.from_arrays(arrays, names=['a', 'b']))
Test Case
s.reset_index(['a', 'b'], drop=True)
Outcome
ValueError: Must pass non-zero number of levels/labels

Test Data
s = pd.Series(range(4), index=pd.RangeIndex(4, name='valid'))
Test Cases
s.reset_index(['valid', 'valid'], drop=True)
s.reset_index(['valid'], drop=True)
s.reset_index('valid', drop=True)
Outcome for each of 3 test cases above
AttributeError: 'RangeIndex' object has no attribute 'droplevel'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(level) < self.index.nlevels test is needed

Right, hadn't considered that zero-levels MultiIndexes can't exist.

The isinstance(self.index, MultiIndex) doesn't make much sense but it's OK to leave it, as it guarantees that if pd.Series(range(4)).reset_index([], drop=True) doesn't fail (pd.Index.droplevel() doesn't exist). Could you add a test case for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Raises appropriate error for Series.reset_index(level_name, drop=True) when index is flat and an invalid level is supplied
Raises appropriate error for Series.reset_index(level_name, drop=True) when index is flat and an invalid level is supplied
Raises appropriate error for Series.reset_index(level_name, drop=True) when index is flat and an invalid level is supplied
Raises appropriate error for Series.reset_index(level_name, drop=True) when index is flat and an invalid level is supplied
Raises appropriate error for Series.reset_index(level_name, drop=True) when index is flat and invalid level is supplied. Made edits as requested in the review.
@jreback jreback changed the title GH20925 BUG: .reset_index() should raise with a non-named level May 13, 2018
# https://github.com/pandas-dev/pandas/issues/20925
# Data for Test Case 1 and 2
s = pd.Series(range(4))
# Test Case 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put blank lines between cases. Don't use a 'Test Case 1' as a lable, either remove of put what it is testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Have introduced blank cases and, replaced labels with comments of what is being tested

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels May 13, 2018
Made edits to test file for labels and introduced blank spaces between test cases.
Made edits to test file for labels and introduced blank spaces between test cases.
Made edits to test file for labels and introduced blank spaces between test cases.
Made edits to test file for labels and introduced blank spaces between test cases.
def test_reset_index_drop_errmsg():
# https://github.com/pandas-dev/pandas/issues/20925

# Check KeyError raised for series where no 'level' name is defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

series -> series index
no 'level' name is defined -> passed level name is missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

with tm.assert_raises_regex(KeyError, 'must be same as name'):
s.reset_index('wrong')

# Check KeyError raised for series where 'level' to be dropped is undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undefined -> missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@toobaz
Copy link
Member

toobaz commented May 16, 2018

I mistakenly wrote in an obsolete discussion, copying here:

len(level) < self.index.nlevels test is needed

Right, hadn't considered that zero-levels MultiIndexes can't exist.

The isinstance(self.index, MultiIndex) doesn't make much sense but it's OK to leave it, as it guarantees that pd.Series(range(4)).reset_index([], drop=True) doesn't fail (pd.Index.droplevel() doesn't exist). Could you add a test case for this?

After that and a couple of minor fixes to comments, I think we're ready.

made suggested edits to tests
made suggested edits to tests
Copy link
Member

@toobaz toobaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes OK, just move to correct file

@@ -768,3 +768,24 @@ def test_head_tail(test_data):
assert_series_equal(test_data.series.head(0), test_data.series[0:0])
assert_series_equal(test_data.series.tail(), test_data.series[-5:])
assert_series_equal(test_data.series.tail(0), test_data.series[0:0])


def test_reset_index_drop_errmsg():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this is in the wrong file... reset_index is tested in pandas/tests/series/test_alter_axes.py, not test_indexing.py.

While you're at it, please replace # https://github.com/pandas-dev/pandas/issues/20925 with the standard # GH 20925.

Copy link
Contributor Author

@KalyanGokhale KalyanGokhale May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks didn't know that the right file was test_alter_axes.py

Have moved the function there (now renamed as test_reset_index_drop_errors based on @TomAugspurger 's comment) - also updated comment for # GH 20925

with tm.assert_raises_regex(KeyError, 'not found'):
s.reset_index('wrong', drop=True)

# Check that .reset_index([],drop=True) doesn't fail
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine but probably not in the right test: could you append it to test_reset_index_level (in pandas/tests/series/test_alter_axes.py)?

Copy link
Contributor Author

@KalyanGokhale KalyanGokhale May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Have moved it to function test_reset_index_level outside the for loop

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release note can go in 0.23.1 (maybe have to merge in master for it to show up).

@@ -768,3 +768,24 @@ def test_head_tail(test_data):
assert_series_equal(test_data.series.head(0), test_data.series[0:0])
assert_series_equal(test_data.series.tail(), test_data.series[-5:])
assert_series_equal(test_data.series.tail(0), test_data.series[0:0])


def test_reset_index_drop_errmsg():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errmsg -> errors or error_message

Copy link
Contributor Author

@KalyanGokhale KalyanGokhale May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. renamed def as test_reset_index_drop_errors

@TomAugspurger TomAugspurger added this to the 0.23.1 milestone May 17, 2018
Made requested changes, moved test to pandas/tests/series/test_alter_axes.py
Made requested changes, moved test to pandas/tests/series/test_alter_axes.py
@toobaz
Copy link
Member

toobaz commented May 17, 2018

Looks all good to me, ready to merge after the whatsnew note is added.

@KalyanGokhale
Copy link
Contributor Author

Updated v0.23.1.txt under Bug Fixes > Indexing as:
Bug in :meth:Series.reset_index where appropriate error was not raised with a non-named level (:issue:20925)

@@ -59,7 +59,7 @@ Conversion
Indexing
^^^^^^^^

-
- Bug in :meth:`Series.reset_index` where appropriate error was not raised with a non-named level (:issue:`20925`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the "a non-named level". Wouldn't it be "an invalid level name"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Done.

@KalyanGokhale KalyanGokhale changed the title BUG: .reset_index() should raise with a non-named level BUG: .reset_index() should raise with an invalid level name May 18, 2018
@KalyanGokhale KalyanGokhale changed the title BUG: .reset_index() should raise with an invalid level name BUG: .reset_index() should raise with an invalid level name (GH20925) May 18, 2018
@toobaz toobaz merged commit e033c06 into pandas-dev:master May 18, 2018
@toobaz
Copy link
Member

toobaz commented May 18, 2018

@KalyanGokhale merged, thanks!

@KalyanGokhale KalyanGokhale deleted the reset-index-errmsg branch May 19, 2018 05:53
jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Jun 8, 2018
jorisvandenbossche pushed a commit that referenced this pull request Jun 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series.reset_index(level_name, drop=True) accepts invalid name when index is flat
6 participants