-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: transpose not respecting CoW #51430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Updated for new block manager method |
@phofl there are some relevant failures here with the usage of |
Is it possible to get views when splitting the block and converting the dtype? We got the errors when converting object to a proper dtype which splitted the block |
That sounds impossible, unless one of the split blocks kept its object dtype? |
Ok then my train of though was correct. No there is no reason to split and keep object |
@@ -252,6 +252,9 @@ def add_references(self, mgr: BaseBlockManager) -> None: | |||
Adds the references from one manager to another. We assume that both | |||
managers have the same block structure. | |||
""" | |||
if len(self.blocks) != len(mgr.blocks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't understand how the case in transpose can ever run into this? Because we should only get here in case of self._can_fast_transpose
, and then by definition there should only be 1 block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df = DataFrame({"a": [pd.Timedelta("1 days"), pd.NaT, pd.Timedelta("2 days")]}, dtype="object")
df.T
the constructor infers this to date time after transposing the array. We get 1 Timedelta block back and another NaT block We could consider this a bug though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I seem to recall that we have had discussions about this corner case before (IMO we should indeed change that)
I tried locally with object dtype with ints to test, but so this corner case is only for time like, I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we are calling maybe_infer_datetimelike under the hood which causes this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we deprecate this first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to recall that we have had discussions about this corner case before (IMO we should indeed change that)
And note that this is not only about transpose, it's about constructors in general. So the question is also if we want to change this:
In [10]: df = DataFrame({"a": np.array([pd.Timedelta("1 days"), pd.NaT, pd.Timedelta("2 days")], dtype=object)})
In [11]: df
Out[11]:
a
0 1 days
1 NaT
2 2 days
In [12]: df.dtypes
Out[12]:
a timedelta64[ns]
dtype: object
(but that's for another issue)
Although of course we could fix this locally in transpose to specifically fix transpose to preserve the dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the constructor would have to be deprecated. Transpose is more of a bug fix imo because that we are using the inference rules of the constructor under the hood is more an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One related issue: #39117
It's also not clear we necessarily want to deprecate this for the constructors. But for transpose() I would certainly consider it a bug / something to change. And for transpose() I am not sure a deprecation is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah constructors makes a bit of sense.
I'll try what's failing locally when changing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have tests covering exactly this behavior :(
FAILED pandas/tests/frame/methods/test_transpose.py::TestTranspose::test_transpose_tzaware_2col_mixed_tz - AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="A") are different
FAILED pandas/tests/frame/methods/test_transpose.py::TestTranspose::test_transpose_object_to_tzaware_mixed_tz - assert False
So lets merge this without tackling this case
Merging, I'll tackle the transpose case separately |
#51564) Backport PR #51430: BUG: transpose not respecting CoW Co-authored-by: Patrick Hoefler <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This is the same solution as in iterrows, I work on a solution on the BlockManager level now