Skip to content

BUG: fix DataFrame.apply returning wrong result when dealing with dtype (#28773) #30304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -876,11 +876,11 @@ Other
- Bug in :meth:`DataFrame.append` that raised ``IndexError`` when appending with empty list (:issue:`28769`)
- Fix :class:`AbstractHolidayCalendar` to return correct results for
years after 2030 (now goes up to 2200) (:issue:`27790`)
- Bug in :meth:`DataFrame.apply` returning wrong result in some cases when dtype was involved in passed function (:issue:`28773`)
- Fixed :class:`IntegerArray` returning ``inf`` rather than ``NaN`` for operations dividing by 0 (:issue:`27398`)
- Fixed ``pow`` operations for :class:`IntegerArray` when the other value is ``0`` or ``1`` (:issue:`29997`)
- Bug in :meth:`Series.count` raises if use_inf_as_na is enabled (:issue:`29478`)


.. _whatsnew_1000.contributors:

Contributors
Expand Down
67 changes: 57 additions & 10 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -6564,16 +6564,63 @@ def apply(self, func, axis=0, raw=False, result_type=None, args=(), **kwds):
"""
from pandas.core.apply import frame_apply

op = frame_apply(
self,
func=func,
axis=axis,
raw=raw,
result_type=result_type,
args=args,
kwds=kwds,
)
return op.get_result()
# Old apply function, which will be used for each part of DataFrame
def partial_apply(dataframe):
op = frame_apply(
dataframe,
func=func,
axis=axis,
raw=raw,
result_type=result_type,
args=args,
kwds=kwds,
)
return op.get_result()

def get_dtype(dataframe, column):
return dataframe.dtypes.values[column]

if axis == 0 or axis == "index":
if self.shape[1] == 0:
return partial_apply(self)

frame = self.iloc[:, [0]]
result = partial_apply(frame)
if isinstance(result, Series):
results = result.values
else:
results = result

i = 1
while i < self.shape[1]:
type = get_dtype(self, i)
j = i + 1

# While the dtype of column is the same as previous ones,
# they are handled together
while j < self.shape[1] and pandas.core.dtypes.common.is_dtype_equal(
type, get_dtype(self, j)
):
j += 1
frame = self.iloc[:, i:j]
i = j
result = partial_apply(frame)

if isinstance(result, Series):
results = np.append(results, result.values)
else:
for k in range(result.shape[0], results.shape[0]):
result.loc[k, :] = np.nan
for k in range(results.shape[0], result.shape[0]):
results.loc[k, :] = np.nan
results = pandas.concat([results, result], axis=1)

if isinstance(result, Series):
return Series(results, index=self.columns)
else:
return results
else:
return partial_apply(self)

def applymap(self, func):
"""
Expand Down
6 changes: 6 additions & 0 deletions pandas/tests/frame/test_apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -691,6 +691,12 @@ def test_apply_dup_names_multi_agg(self):

tm.assert_frame_equal(result, expected)

def test_apply_get_dtype(self):
# GH 28773
df = DataFrame({"col_1": [1, 2, 3], "col_2": ["hi", "there", "friend"]})
expected = Series(data=["int64", "object"], index=["col_1", "col_2"])
tm.assert_series_equal(df.apply(lambda x: x.dtype), expected)


class TestInferOutputShape:
# the user has supplied an opaque UDF where
Expand Down