Skip to content

BUG: set_levels set illegal levels. #14236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 10, 2016

Conversation

bkandel
Copy link
Contributor

@bkandel bkandel commented Sep 16, 2016

MultiIndex.set_levels, when given illegal level values, raises an error.
When inplace=True, though, the illegal level values are still accepted. This
commit fixes that behavior by checking that the proposed level values are legal
before setting them.

@codecov-io
Copy link

codecov-io commented Sep 16, 2016

Current coverage is 85.26% (diff: 100%)

Merging #14236 into master will increase coverage by <.01%

@@             master     #14236   diff @@
==========================================
  Files           140        140          
  Lines         50633      50634     +1   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43170      43172     +2   
+ Misses         7463       7462     -1   
  Partials          0          0          

Powered by Codecov. Last update f9ebeae...39b7bde

@bkandel bkandel force-pushed the fix_label_integrity branch 2 times, most recently from 89ba77f to fb16c65 Compare September 19, 2016 10:54
@jreback jreback changed the title BUG: set_levels set illegal levels. BUG: set_levels set illegal levels. Sep 19, 2016
@@ -116,12 +116,13 @@ def __new__(cls, levels=None, labels=None, sortorder=None, names=None,

return result

def _verify_integrity(self):
def _verify_integrity(self, new_labels=None, new_levels=None):
"""Raises ValueError if length of levels and labels don't match or any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a Parameters section

with assertRaisesRegexp(ValueError, "^On"):
self.index.set_levels(['c'], level=0, inplace=True)
assert_matching(self.index.levels, original_index.levels)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test set_labels (with invalid labels)

# NOTE: Currently does not check, among other things, that cached
# nlevels matches nor that sortorder matches actually sortorder.
labels, levels = self.labels, self.levels
labels = new_labels or self.labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need a check for list-like, otherwise the len check will give an odd error message. use is_list_like

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add a test which exercises this check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test for this, but didn't add a separate check within _verify_integrity because the check for list-like is already present in set_levels and set_labels: https://github.com/pydata/pandas/blob/master/pandas/indexes/multi.py#L221 and https://github.com/pydata/pandas/blob/master/pandas/indexes/multi.py#L322.

@bkandel
Copy link
Contributor Author

bkandel commented Sep 22, 2016

@jreback the Travis OSX build keeps on failing on conda install. Could you maybe delete the cache and restart the build? Or any other ideas? I'm pretty sure the failure is not caused by these changes.

@bkandel bkandel force-pushed the fix_label_integrity branch 2 times, most recently from 49be0b7 to 2cb6d8c Compare September 23, 2016 10:58

with assertRaisesRegexp(TypeError, "^Levels"):
self.index.set_levels('c', level=0, inplace=True)
assert_matching(self.index.levels, original_index.levels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add tests for inplace=False as well (you can loop over the param I think)

@jreback jreback added this to the 0.19.0 milestone Sep 27, 2016
@jreback
Copy link
Contributor

jreback commented Sep 27, 2016

minor comments, ping when green.

@jreback
Copy link
Contributor

jreback commented Sep 27, 2016

lgtm. ping on green.

@@ -154,7 +154,7 @@ def assert_matching(actual, expected):
# as much as possible
self.assertEqual(len(actual), len(expected))
for act, exp in zip(actual, expected):
act = np.asarray(act)
act = np.asarray(act, dtype=np.object_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you need to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the expected is changed to np.object_ and if I don't change act also the test fails because the expected and actual are not changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show me an example? it is passing now, so unclear why you had to change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed up commit e0c8897 that undoes this change and fails with the error:

E       AssertionError: numpy array are different
E       
E       Attribute "dtype" are different
E       [left]:  int8
E       [right]: object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int8 is right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But when I take out the coercion to np.object_, the test test_set_levels fails with

E       AssertionError: numpy array are different
E       
E       Attribute "dtype" are different
E       [left]:  object
E       [right]: |S4

This is in the test

        # level changing [w/o mutation]
        ind2 = self.index.set_levels(new_levels)
        assert_matching(ind2.levels, new_levels)
        assert_matching(self.index.levels, levels)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this issue.

@jreback
Copy link
Contributor

jreback commented Sep 27, 2016

you need to rebase as just merged #14303

@bkandel bkandel force-pushed the fix_label_integrity branch from d636fdf to 5e8af7c Compare September 27, 2016 23:58
# NOTE: Currently does not check, among other things, that cached
# nlevels matches nor that sortorder matches actually sortorder.
labels, levels = self.labels, self.levels
labels = new_labels or self.labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can new_levels or new_labels ever be a NumPy array (e.g. user passes idx.set_levels(np.array['a', 'b']), or will they have been coerced to a list by now? If they're an array this will raise a ValueError. You could use labels = new_labels if new_labels is not None else self.labels.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, looks like it'll be a FrozenList by this point. Never mind then.

@jreback jreback modified the milestones: 0.19.0, 0.19.1 Sep 28, 2016
@bkandel
Copy link
Contributor Author

bkandel commented Sep 30, 2016

@jreback I believe I have addressed all of the comments.

- Bug in ``df.groupby`` where ``.median()`` returns arbitrary values if grouped dataframe contains empty bins (:issue:`13629`)
- Bug in ``Index.copy()`` where ``name`` parameter was ignored (:issue:`14302`)
- Bug in ``MultiIndex.set_levels`` where illegal level values were still set after raising an error (:issue:`13754`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.19.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

Labels to check for validity. Defaults to current labels.
new_levels : optional list
Levels to check for validity. Defaults to current levels.
"""
# NOTE: Currently does not check, among other things, that cached
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a Raises section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -116,12 +116,22 @@ def __new__(cls, levels=None, labels=None, sortorder=None, names=None,

return result

def _verify_integrity(self):
def _verify_integrity(self, new_labels=None, new_levels=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call these labels, levels. the new is confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -149,13 +149,16 @@ def test_set_levels(self):
levels = self.index.levels
new_levels = [[lev + 'a' for lev in level] for level in levels]

def assert_matching(actual, expected):
def assert_matching(actual, expected, coerce_to_obj=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this check_dtype and simply pass onto assert_numpy_array_equal as the same parameter name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@bkandel bkandel force-pushed the fix_label_integrity branch from 1b0037a to b9b08fb Compare October 7, 2016 20:29
@bkandel
Copy link
Contributor Author

bkandel commented Oct 7, 2016

@jreback I think all comments are addressed now.

@jreback jreback added the Error Reporting Incorrect or improved errors from pandas label Oct 8, 2016
@jreback
Copy link
Contributor

jreback commented Oct 8, 2016

thanks @bkandel lgtm.

@jorisvandenbossche can be the first back-port test!

Ben Kandel added 9 commits October 9, 2016 13:40
`MultiIndex.set_levels`, when passed in illegal level values, raises an error.
When `inplace=True`, though, the illegal level values are still accepted. This
commit fixes that behavior by checking that the proposed level values are legal
before setting them.

added tests and docs.

lint

added tests.

force build

force build

force build

force build for osx
@bkandel bkandel force-pushed the fix_label_integrity branch from b9b08fb to 39b7bde Compare October 9, 2016 17:40
@jorisvandenbossche jorisvandenbossche merged commit d98e982 into pandas-dev:master Oct 10, 2016
@jorisvandenbossche
Copy link
Member

@bkandel Thanks a lot!

yarikoptic added a commit to neurodebian/pandas that referenced this pull request Oct 13, 2016
* commit 'v0.19.0-14-ga40e185': (37 commits)
  BUG: Bug in localizing an ambiguous timezone when a boolean is passed
  Convert readthedocs links for their .org -> .io migration for hosted projects (pandas-dev#14406)
  DOC: formatting in basics.rst
  BLD/CI: cython cache pxd files (pandas-dev#14363)
  BUG: set_levels set illegal levels. (pandas-dev#14236)
  DOC: add whitespace to v0.19.1 bug fix section
  change impl details slightly for pandas-dev#14292
  BUG: Fix concat key name
  DOC: add 0.19.1 whatsnew file (pandas-dev#14366)
  DOC: to_csv warns regarding quoting behaviour for floats pandas-dev#14195 (pandas-dev#14228)
  DOC: fix formatting issue with msgpack table
  TST: pandas-dev#14345 fixes TestDatetimeIndexOps test_nat AssertionErrors on 32-bit
  docs: Remove old warning from dsintro.rst (pandas-dev#14365)
  DOC: minor v0.19.0 whatsnew corrections
  RLS: v0.19.0
  DOC: update release notes
  DOC: Latest fixes for whatsnew file
  to_latex encoding follows the documentation (py2 ascii, py3 utf8) (pandas-dev#14329)
  DOC: fix some sphinx build issues (pandas-dev#14332)
  TST: fix period tests for numpy 1.9.3 (GH14183) (pandas-dev#14331)
  ...
tworec pushed a commit to RTBHOUSE/pandas that referenced this pull request Oct 21, 2016
`MultiIndex.set_levels`, when passed in illegal level values, raises an error.
When `inplace=True`, though, the illegal level values are still accepted. This
commit fixes that behavior by checking that the proposed level values are legal
before setting them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: failing MultiIndex.set_levels with inplace=True still modifies original index
5 participants