Skip to content

BUG: Fix for passing multiple ints as levels in DataFrame.stack() (#7660) #7770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 21, 2014

Conversation

onesandzeroes
Copy link
Contributor

closes #7660
The specific case that came up in #7660 (and originally in #7653) seems easy enough to fix, so I've covered that in the PR. I feel like this raises a few other potential problems though, e.g.:

  • If the list of level numbers is not in order, is there any sensible way to deal with them? I've sorted the level list in my fix, as that seems like the most straightforward way of making sure that when you do each stack, it's only the level numbers higher than the current that are affected. This might produce undesired results though, so maybe we should just raise a ValueError if the level numbers aren't sorted?
  • I'm not sure how to extend this to deal with negative level numbers.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2014

their is a method

index._get_level_number() which handles translation and validation (not ordering though)

also shouldn't this be core/reshape/Stack ?

@onesandzeroes
Copy link
Contributor Author

The initial bug report was using columns with no level names- if you have names I can see how index._get_level_number() helps you keep track, but without names I'm not sure how you can use it? The whole issue comes up because as you do the stacks one by one, the level numbers for the remaining levels change.

The existing behaviour handles the multiple levels case within DataFrame.stack() and just sends one level at a time to core.reshape.stack, I guess I could try to move the whole thing, including existing behaviour, over to core.reshape.stack if that's the most logical place for it?

@jreback
Copy link
Contributor

jreback commented Jul 17, 2014

get_level_number validates - that's the main use (eg an out of range level number) also handles ambiguity - eg if a level is named 1

u can make a function in core/reshape to handle this kind of thing - maybe multi_stack (and move pretty much the entire stack routine their)

@onesandzeroes
Copy link
Contributor Author

@jreback: I went back over this and I think what I've come up with now is better overall. Moved the bulk of the logic to the reshape module. At the moment I think I've got a pretty good handle on the cases where

  • All the passed levels match up to the column levels, whether they're ints or strings
  • All the passed levels are ints, but don't match up to column levels: in this case it seems
    pretty safe to assume they represent level numbers, rather than names

Not sure what to to do with more ambiguous cases though, since I guess someone could
pass a mixture of ints and strings. Maybe just throw an exception and say that the list
of levels must either be all names or all level numbers?

@jreback jreback added this to the 0.15.0 milestone Jul 17, 2014
@jreback jreback changed the title BUG: Fix for passing multiple ints as levels in DataFrame.stack() (#7660) BUG: Fix for passing multiple ints as levels in DataFrame.stack() (#7660) Jul 17, 2014
for lev in level:
result = stack(result, lev, dropna=dropna)
# Otherwise, level numbers may change as each successive level is stacked
elif all(isinstance(lev, int) for lev in level):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these also must be valid levels when you start (e.g. [3,1] is not ok for a len(levels=2))

@jreback
Copy link
Contributor

jreback commented Jul 17, 2014

looks reasonable. just add some out-of-bounds checking on the levels (and tests for those). pls add a note in v0.15.0 as well (API section is ok)

@onesandzeroes
Copy link
Contributor Author

@jreback OK, this ended up requiring a few more changes. index._get_level_number() was already doing most of the validation of the levels, so by adding a check for out-of-range negative level numbers, it became possible to use that to do all the conversion and out-of-bounds checks in the stack functions. I've added test cases, but let me know if this is too many changes for a single PR. Can also add more to the release notes.

Otherwise, if things are looking good, I can squash back to a single commit.

@@ -32,6 +32,10 @@ API changes

.. _whatsnew_0150.cat:

- Passing multiple levels to `DataFrame.stack()` will now work when multiple level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this above the _whatnew_0150.cat, you are in a different section here

@jreback
Copy link
Contributor

jreback commented Jul 18, 2014

minor comments, then pls squish

@onesandzeroes
Copy link
Contributor Author

@jreback Fixed those two things then squashed, should be good now.

@@ -30,6 +30,10 @@ users upgrade to this version.
API changes
~~~~~~~~~~~

- Passing multiple levels to `DataFrame.stack()` will now work when multiple level
numbers are passed (:issue:`7660`), and will raise a ``ValueError`` when the
levels aren't all level names or all level numbers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a short example to the doc section (reshape.rst), and a reference from here.

@onesandzeroes
Copy link
Contributor Author

@jreback: Added some spacing around sections and some examples to the docs

You may also stack or unstack more than one level at a time by passing a list
of levels, in which case the end result is as if each level in the list were
processed individually.

.. ipython:: python

import itertools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can take out this import

Fix for multiple level numbers

If passed levels match level names, use them directly

Add test cases for multiple stacking

Add check for out of range negative level num in _get_level_number()

Use _get_level_number() to do validation and conversion

Explain why level list isn't iterated over directly

Raise exception on fall-through case

Add test cases for raising exceptions

Use _get_level_number() to convert and validate negative levels

Add changes to release notes

Fix Python 2.6 build issue: use assertRaisesRegexp from testing module

Add tests for out of bounds level numbers

Move API change note to correct section

Add blank line after if/elifs

Add blank line between sections

Too many blank lines between functions

Add multiple stacking examples

Reference to new examples in What's new docs

More blank lines between sections

Remove unused itertools import
@onesandzeroes
Copy link
Contributor Author

@jreback Sorry, went through a few different options while trying to come up with succinct example data, didn't realize that import was still in there.

jreback added a commit that referenced this pull request Jul 21, 2014
BUG: Fix for passing multiple ints as levels in DataFrame.stack() (#7660)
@jreback jreback merged commit 59ba163 into pandas-dev:master Jul 21, 2014
@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

@onesandzeroes thanks for this!

pls check the docs after built and confirm that they look good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: stacking multiple levels
2 participants