Skip to content

REGR: Performance of DataFrame reduction ops with axis=1 #52689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 12 additions & 4 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,10 @@
needs_i8_conversion,
pandas_dtype,
)
from pandas.core.dtypes.dtypes import ExtensionDtype
from pandas.core.dtypes.dtypes import (
BaseMaskedDtype,
ExtensionDtype,
)
from pandas.core.dtypes.missing import (
isna,
notna,
Expand Down Expand Up @@ -3602,9 +3605,14 @@ def transpose(self, *args, copy: bool = False) -> DataFrame:
# We have EAs with the same dtype. We can preserve that dtype in transpose.
dtype = dtypes[0]
arr_type = dtype.construct_array_type()
values = self.values

new_values = [arr_type._from_sequence(row, dtype=dtype) for row in values]
if isinstance(dtype, BaseMaskedDtype):
data, mask = self._mgr.as_array_masked()
new_values = [arr_type(data[i], mask[i]) for i in range(self.shape[0])]
else:
values = self.values
new_values = [
arr_type._from_sequence(row, dtype=dtype) for row in values
]
result = type(self)._from_arrays(
new_values, index=self.columns, columns=self.index
)
Expand Down
34 changes: 34 additions & 0 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1749,6 +1749,40 @@ def as_array(

return arr.transpose()

def as_array_masked(self) -> tuple[np.ndarray, np.ndarray]:
"""
Convert the blockmanager data into an numpy array.

Returns
-------
arr : ndarray
"""
# # TODO(CoW) handle case where resulting array is a view
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche - I don't understand this TODO, can you give more details?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I suppose that with the currently implementation, the resulting arrays are never views but always copies? (since result_data and result_mask are newly created arrays (and then the values are copied in)
In that case I think you can just remove the comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that was just copied from the as_array implementation above, then can certainly be removed. Sorry for the confusion!

# if len(self.blocks) == 0:
# arr = np.empty(self.shape, dtype=float)
# return arr.transpose()

# TODO we already established we only have a single dtype, but this
# could be generalized to be a mix of all masked dtypes
dtype = self.blocks[0].dtype.numpy_dtype

result_data = np.empty(self.shape, dtype=dtype)
result_mask = np.empty(self.shape, dtype="bool")

# itemmask = np.zeros(self.shape[0])

for blk in self.blocks:
rl = blk.mgr_locs
arr = blk.values
result_data[rl.indexer] = arr._data
result_mask[rl.indexer] = arr._mask
# itemmask[rl.indexer] = 1

# if not itemmask.all():
# raise AssertionError("Some items were not contained in blocks")

return result_data.transpose(), result_mask.transpose()

def _interleave(
self,
dtype: np.dtype | None = None,
Expand Down