Skip to content

BUG: fix DataFrame.apply returning wrong result when dealing with dtype (#28773) #30304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1273,6 +1273,7 @@ Other
- Bug in :meth:`DataFrame.append` that raised ``IndexError`` when appending with empty list (:issue:`28769`)
- Fix :class:`AbstractHolidayCalendar` to return correct results for
years after 2030 (now goes up to 2200) (:issue:`27790`)
- Bug in :meth:`DataFrame.apply` returning wrong result in some cases when dtype was involved in passed function (:issue:`28773`)
- Fixed :class:`~arrays.IntegerArray` returning ``inf`` rather than ``NaN`` for operations dividing by ``0`` (:issue:`27398`)
- Fixed ``pow`` operations for :class:`~arrays.IntegerArray` when the other value is ``0`` or ``1`` (:issue:`29997`)
- Bug in :meth:`Series.count` raises if use_inf_as_na is enabled (:issue:`29478`)
Expand Down
23 changes: 20 additions & 3 deletions pandas/core/apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,10 @@
is_list_like,
is_sequence,
)
from pandas.core.dtypes.generic import ABCSeries
from pandas.core.dtypes.generic import (
ABCSeries,
ABCMultiIndex,
)

from pandas.core.construction import create_series_with_explicit_dtype

Expand Down Expand Up @@ -271,16 +274,22 @@ def apply_standard(self):

# we cannot reduce using non-numpy dtypes,
# as demonstrated in gh-12244
if (
can_reduce = (
self.result_type in ["reduce", None]
and not self.dtypes.apply(is_extension_array_dtype).any()
# Disallow complex_internals since libreduction shortcut
# cannot handle MultiIndex
and not isinstance(self.agg_axis, ABCMultiIndex)
# Disallow dtypes where setting _index_data will break
# ExtensionArray values, see GH#31182
and not self.dtypes.apply(lambda x: x.kind in ["m", "M"]).any()
# Disallow complex_internals since libreduction shortcut raises a TypeError
and not self.agg_axis._has_complex_internals
):
)

column_by_column = (self.axis != 0 and self.axis != "index") or self.obj._is_homogeneous_type

if can_reduce and column_by_column:
values = self.values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Reksbril the underlying problem here is that this call to self.values is casting to object dtype which you want to avoid. if you add to the check in 277-283 and self.obj._is_homogeneous_type that should solve the whole thing

index = self.obj._get_axis(self.axis)
labels = self.agg_axis
Expand Down Expand Up @@ -309,9 +318,17 @@ def apply_standard(self):
else:
return self.obj._constructor_sliced(result, index=labels)


# compute the result using the series generator
results, res_index = self.apply_series_generator()

if can_reduce and not column_by_column:
results = list(results.values())
results = np.array(results)
return self.obj._constructor_sliced(
results, index=res_index
)

# wrap results
return self.wrap_results(results, res_index)

Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/frame/test_apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -704,6 +704,13 @@ def test_apply_dup_names_multi_agg(self):

tm.assert_frame_equal(result, expected)

def test_apply_get_dtype(self):
# GH 28773
df = DataFrame({"col_1": [1, 2, 3], "col_2": ["hi", "there", "friend"]})
result = df.apply(lambda x: x.dtype)
expected = Series(data=["int64", "object"], index=["col_1", "col_2"])
tm.assert_series_equal(result, expected)

def test_apply_nested_result_axis_1(self):
# GH 13820
def apply_list(row):
Expand Down