-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: unstack #33474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: unstack #33474
Conversation
@@ -205,6 +205,9 @@ def get_new_values(self, values, fill_value=None): | |||
|
|||
# we can simply reshape if we don't have a mask | |||
if mask_all and len(values): | |||
# TODO: Under what circumstances can we rely on sorted_values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if the we are sorted already then you can simplify, but I am not sure we can make that guarantee.
looks fine. you can try experimenting if you try to unstack on a non-sorted index and raising and see what breaks. |
new_values = new_values.T[mask] | ||
new_placement = new_placement[mask] | ||
|
||
blocks = [make_block(new_values, placement=new_placement)] | ||
blocks = [self.make_block_same_class(new_values, placement=new_placement)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel I think it's just this line that caused the regression in #37115 and that we don't need to revert the complete PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of block(s) do you end up with instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrupted? an IntBlock with dtype float64
>>> pd.__version__
'1.2.0.dev0+1432.g5fdf642368'
>>>
>>> df1 = pd.DataFrame(
... {
... "a": ["A", "A", "B"],
... "b": ["ca", "cb", "cb"],
... "v": [10] * 3,
... }
... )
>>> df1 = df1.set_index(["a", "b"])
>>> df1["is_"] = 1
>>> df1
v is_
a b
A ca 10 1
cb 10 1
B cb 10 1
>>>
>>> df1._data
BlockManager
Items: Index(['v', 'is_'], dtype='object')
Axis 1: MultiIndex([('A', 'ca'),
('A', 'cb'),
('B', 'cb')],
names=['a', 'b'])
IntBlock: slice(0, 1, 1), 1 x 3, dtype: int64
IntBlock: slice(1, 2, 1), 1 x 3, dtype: int64
>>>
>>> df2 = df1.unstack("b")
>>> df2
v is_
b ca cb ca cb
a
A 10.0 10.0 1.0 1.0
B NaN 10.0 NaN 1.0
>>>
>>> df2._data
BlockManager
Items: MultiIndex([( 'v', 'ca'),
( 'v', 'cb'),
('is_', 'ca'),
('is_', 'cb')],
names=[None, 'b'])
Axis 1: Index(['A', 'B'], dtype='object', name='a')
IntBlock: slice(0, 2, 1), 2 x 2, dtype: float64
IntBlock: slice(2, 4, 1), 2 x 2, dtype: float64
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run the test suite with just this change reverted and all tests pass. will put up a PR shortly.
cc @jreback this moves a small amount of the unstack logic up the call stack, which is worthwhile, but the big wins available are if you can help me figure out the questions in inline comments.