REF: unstack #33474

jbrockmendel · 2020-04-11T03:00:36Z

cc @jreback this moves a small amount of the unstack logic up the call stack, which is worthwhile, but the big wins available are if you can help me figure out the questions in inline comments.

…f-unstack-2

jreback · 2020-04-12T20:33:22Z

pandas/core/reshape/reshape.py

@@ -205,6 +205,9 @@ def get_new_values(self, values, fill_value=None):

        # we can simply reshape if we don't have a mask
        if mask_all and len(values):
+            # TODO: Under what circumstances can we rely on sorted_values


I think if the we are sorted already then you can simplify, but I am not sure we can make that guarantee.

jreback · 2020-04-12T20:34:04Z

looks fine. you can try experimenting if you try to unstack on a non-sorted index and raising and see what breaks.

simonjayhawkins · 2020-12-02T17:24:27Z

pandas/core/internals/blocks.py

        new_values = new_values.T[mask]
        new_placement = new_placement[mask]

-        blocks = [make_block(new_values, placement=new_placement)]
+        blocks = [self.make_block_same_class(new_values, placement=new_placement)]


@jbrockmendel I think it's just this line that caused the regression in #37115 and that we don't need to revert the complete PR.

what kind of block(s) do you end up with instead?

corrupted? an IntBlock with dtype float64

>>> pd.__version__ '1.2.0.dev0+1432.g5fdf642368' >>> >>> df1 = pd.DataFrame( ... { ... "a": ["A", "A", "B"], ... "b": ["ca", "cb", "cb"], ... "v": [10] * 3, ... } ... ) >>> df1 = df1.set_index(["a", "b"]) >>> df1["is_"] = 1 >>> df1 v is_ a b A ca 10 1 cb 10 1 B cb 10 1 >>> >>> df1._data BlockManager Items: Index(['v', 'is_'], dtype='object') Axis 1: MultiIndex([('A', 'ca'), ('A', 'cb'), ('B', 'cb')], names=['a', 'b']) IntBlock: slice(0, 1, 1), 1 x 3, dtype: int64 IntBlock: slice(1, 2, 1), 1 x 3, dtype: int64 >>> >>> df2 = df1.unstack("b") >>> df2 v is_ b ca cb ca cb a A 10.0 10.0 1.0 1.0 B NaN 10.0 NaN 1.0 >>> >>> df2._data BlockManager Items: MultiIndex([( 'v', 'ca'), ( 'v', 'cb'), ('is_', 'ca'), ('is_', 'cb')], names=[None, 'b']) Axis 1: Index(['A', 'B'], dtype='object', name='a') IntBlock: slice(0, 2, 1), 2 x 2, dtype: float64 IntBlock: slice(2, 4, 1), 2 x 2, dtype: float64 >>>

run the test suite with just this change reverted and all tests pass. will put up a PR shortly.

jbrockmendel added 6 commits April 9, 2020 09:55

REF: move shared calls up the stack

7c33dc2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

bc19cad

…f-unstack-2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

45c7a8f

…f-unstack-2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

b2feffe

…f-unstack-2

Merge branch 'master' of https://github.com/pandas-dev/pandas into re…

80e0204

…f-unstack-2

Comments

8f0e5d4

gfyoung added Refactor Internal refactoring of code Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 12, 2020

jreback reviewed Apr 12, 2020

View reviewed changes

jreback added this to the 1.1 milestone Apr 12, 2020

jreback merged commit 12f9a10 into pandas-dev:master Apr 12, 2020

simonjayhawkins mentioned this pull request Nov 25, 2020

REGR: unstack on 'int' dtype prevent fillna to work #37115

Closed

3 tasks

simonjayhawkins reviewed Dec 2, 2020

View reviewed changes

jbrockmendel deleted the ref-unstack-2 branch December 2, 2020 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

REF: unstack #33474

REF: unstack #33474

Uh oh!

jbrockmendel commented Apr 11, 2020

Uh oh!

jreback Apr 12, 2020

Uh oh!

jreback commented Apr 12, 2020

Uh oh!

simonjayhawkins Dec 2, 2020

Uh oh!

jbrockmendel Dec 2, 2020

Uh oh!

simonjayhawkins Dec 2, 2020

Uh oh!

simonjayhawkins Dec 2, 2020

Uh oh!

Uh oh!

Uh oh!

REF: unstack #33474

REF: unstack #33474

Uh oh!

Conversation

jbrockmendel commented Apr 11, 2020

Uh oh!

jreback Apr 12, 2020

Choose a reason for hiding this comment

Uh oh!

jreback commented Apr 12, 2020

Uh oh!

simonjayhawkins Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

simonjayhawkins Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

simonjayhawkins Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!