Skip to content

[ENH] Allow pad, backfill and cumcount in groupby.transform #34453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions doc/source/reference/frame.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,6 @@ Conversion
DataFrame.convert_dtypes
DataFrame.infer_objects
DataFrame.copy
DataFrame.isna
DataFrame.notna
DataFrame.bool

Indexing, iteration
Expand Down Expand Up @@ -211,10 +209,18 @@ Missing data handling
.. autosummary::
:toctree: api/

DataFrame.backfill
DataFrame.bfill
DataFrame.dropna
DataFrame.ffill
DataFrame.fillna
DataFrame.replace
DataFrame.interpolate
DataFrame.isna
DataFrame.isnull
DataFrame.notna
DataFrame.notnull
DataFrame.pad
DataFrame.replace

Reshaping, sorting, transposing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
5 changes: 5 additions & 0 deletions doc/source/reference/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ Computations / descriptive stats
GroupBy.all
GroupBy.any
GroupBy.bfill
GroupBy.backfill
GroupBy.count
GroupBy.cumcount
GroupBy.cummax
Expand All @@ -67,6 +68,7 @@ Computations / descriptive stats
GroupBy.ngroup
GroupBy.nth
GroupBy.ohlc
GroupBy.pad
GroupBy.prod
GroupBy.rank
GroupBy.pct_change
Expand All @@ -88,10 +90,12 @@ application to columns of a specific data type.

DataFrameGroupBy.all
DataFrameGroupBy.any
DataFrameGroupBy.backfill
DataFrameGroupBy.bfill
DataFrameGroupBy.corr
DataFrameGroupBy.count
DataFrameGroupBy.cov
DataFrameGroupBy.cumcount
DataFrameGroupBy.cummax
DataFrameGroupBy.cummin
DataFrameGroupBy.cumprod
Expand All @@ -106,6 +110,7 @@ application to columns of a specific data type.
DataFrameGroupBy.idxmin
DataFrameGroupBy.mad
DataFrameGroupBy.nunique
DataFrameGroupBy.pad
DataFrameGroupBy.pct_change
DataFrameGroupBy.plot
DataFrameGroupBy.quantile
Expand Down
11 changes: 9 additions & 2 deletions doc/source/reference/series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -214,11 +214,18 @@ Missing data handling
.. autosummary::
:toctree: api/

Series.isna
Series.notna
Series.backfill
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche is there any way we can indicate aliases here? (should we just not list them)?

Series.bfill
Series.dropna
Series.ffill
Series.fillna
Series.interpolate
Series.isna
Series.isnull
Series.notna
Series.notnull
Series.pad
Series.replace

Reshaping, sorting
------------------
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,7 @@ Other enhancements
- :meth:`HDFStore.put` now accepts `track_times` parameter. Parameter is passed to ``create_table`` method of ``PyTables`` (:issue:`32682`).
- Make :class:`pandas.core.window.Rolling` and :class:`pandas.core.window.Expanding` iterable(:issue:`11704`)
- Make ``option_context`` a :class:`contextlib.ContextDecorator`, which allows it to be used as a decorator over an entire function (:issue:`34253`).
- :meth:`groupby.transform` now allows ``func`` to be ``pad``, ``backfill`` and ``cumcount`` (:issue:`31269`).

.. ---------------------------------------------------------------------------

Expand Down
4 changes: 4 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -6193,6 +6193,8 @@ def ffill(
method="ffill", axis=axis, inplace=inplace, limit=limit, downcast=downcast
)

pad = ffill

def bfill(
self: FrameOrSeries,
axis=None,
Expand All @@ -6212,6 +6214,8 @@ def bfill(
method="bfill", axis=axis, inplace=inplace, limit=limit, downcast=downcast
)

backfill = bfill

@doc(klass=_shared_doc_kwargs["klass"])
def replace(
self,
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,8 @@ def transform(self, func, *args, engine="cython", engine_kwargs=None, **kwargs):
elif func in base.cythonized_kernels:
# cythonized transform or canned "agg+broadcast"
return getattr(self, func)(*args, **kwargs)
elif func in base.transformation_kernels:
return getattr(self, func)(*args, **kwargs)

# If func is a reduction, we need to broadcast the
# result to the whole group. Compute func result
Expand Down Expand Up @@ -1464,6 +1466,8 @@ def transform(self, func, *args, engine="cython", engine_kwargs=None, **kwargs):
elif func in base.cythonized_kernels:
# cythonized transformation or canned "reduction+broadcast"
return getattr(self, func)(*args, **kwargs)
elif func in base.transformation_kernels:
return getattr(self, func)(*args, **kwargs)

# GH 30918
# Use _transform_fast only when we know func is an aggregation
Expand Down
20 changes: 15 additions & 5 deletions pandas/tests/groupby/transform/test_transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,15 +323,22 @@ def test_transform_transformation_func(transformation_func):
{
"A": ["foo", "foo", "foo", "foo", "bar", "bar", "baz"],
"B": [1, 2, np.nan, 3, 3, np.nan, 4],
}
},
index=pd.date_range("2020-01-01", "2020-01-07"),
)

if transformation_func in ["pad", "backfill", "tshift", "cumcount"]:
# These transformation functions are not yet covered in this test
pytest.xfail("See GH 31269")
if transformation_func == "cumcount":
test_op = lambda x: x.transform("cumcount")
mock_op = lambda x: Series(range(len(x)), x.index)
elif transformation_func == "fillna":
test_op = lambda x: x.transform("fillna", value=0)
mock_op = lambda x: x.fillna(value=0)
elif transformation_func == "tshift":
msg = (
"Current behavior of groupby.tshift is inconsistent with other "
"transformations. See GH34452 for more details"
)
pytest.xfail(msg)
else:
test_op = lambda x: x.transform(transformation_func)
mock_op = lambda x: getattr(x, transformation_func)()
Expand All @@ -340,7 +347,10 @@ def test_transform_transformation_func(transformation_func):
groups = [df[["B"]].iloc[:4], df[["B"]].iloc[4:6], df[["B"]].iloc[6:]]
expected = concat([mock_op(g) for g in groups])

tm.assert_frame_equal(result, expected)
if transformation_func == "cumcount":
tm.assert_series_equal(result, expected)
else:
tm.assert_frame_equal(result, expected)


def test_transform_select_columns(df):
Expand Down