Skip to content

REF: unstack #33474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 12, 2020
Merged

REF: unstack #33474

merged 6 commits into from
Apr 12, 2020

Conversation

jbrockmendel
Copy link
Member

cc @jreback this moves a small amount of the unstack logic up the call stack, which is worthwhile, but the big wins available are if you can help me figure out the questions in inline comments.

@gfyoung gfyoung added Refactor Internal refactoring of code Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 12, 2020
@@ -205,6 +205,9 @@ def get_new_values(self, values, fill_value=None):

# we can simply reshape if we don't have a mask
if mask_all and len(values):
# TODO: Under what circumstances can we rely on sorted_values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the we are sorted already then you can simplify, but I am not sure we can make that guarantee.

@jreback jreback added this to the 1.1 milestone Apr 12, 2020
@jreback jreback merged commit 12f9a10 into pandas-dev:master Apr 12, 2020
@jreback
Copy link
Contributor

jreback commented Apr 12, 2020

looks fine. you can try experimenting if you try to unstack on a non-sorted index and raising and see what breaks.

new_values = new_values.T[mask]
new_placement = new_placement[mask]

blocks = [make_block(new_values, placement=new_placement)]
blocks = [self.make_block_same_class(new_values, placement=new_placement)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I think it's just this line that caused the regression in #37115 and that we don't need to revert the complete PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of block(s) do you end up with instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrupted? an IntBlock with dtype float64

>>> pd.__version__
'1.2.0.dev0+1432.g5fdf642368'
>>>
>>> df1 = pd.DataFrame(
...     {
...         "a": ["A", "A", "B"],
...         "b": ["ca", "cb", "cb"],
...         "v": [10] * 3,
...     }
... )
>>> df1 = df1.set_index(["a", "b"])
>>> df1["is_"] = 1
>>> df1
       v  is_
a b
A ca  10    1
  cb  10    1
B cb  10    1
>>>
>>> df1._data
BlockManager
Items: Index(['v', 'is_'], dtype='object')
Axis 1: MultiIndex([('A', 'ca'),
            ('A', 'cb'),
            ('B', 'cb')],
           names=['a', 'b'])
IntBlock: slice(0, 1, 1), 1 x 3, dtype: int64
IntBlock: slice(1, 2, 1), 1 x 3, dtype: int64
>>>
>>> df2 = df1.unstack("b")
>>> df2
      v        is_
b    ca    cb   ca   cb
a
A  10.0  10.0  1.0  1.0
B   NaN  10.0  NaN  1.0
>>>
>>> df2._data
BlockManager
Items: MultiIndex([(  'v', 'ca'),
            (  'v', 'cb'),
            ('is_', 'ca'),
            ('is_', 'cb')],
           names=[None, 'b'])
Axis 1: Index(['A', 'B'], dtype='object', name='a')
IntBlock: slice(0, 2, 1), 2 x 2, dtype: float64
IntBlock: slice(2, 4, 1), 2 x 2, dtype: float64
>>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run the test suite with just this change reverted and all tests pass. will put up a PR shortly.

@jbrockmendel jbrockmendel deleted the ref-unstack-2 branch December 2, 2020 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants