Skip to content

Commit 817ec46

Browse files
rhshadrachim-vinicius
authored and
im-vinicius
committed
DEPR: Replacing builtin and NumPy funcs in agg/apply/transform (pandas-dev#53974)
* DEPR: Replacing builtin and NumPy funcs in agg/apply/transform * mypy fixup
1 parent cace5b8 commit 817ec46

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+510
-272
lines changed

doc/source/getting_started/comparison/comparison_with_r.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,7 @@ In pandas we may use :meth:`~pandas.pivot_table` method to handle this:
246246
}
247247
)
248248
249-
baseball.pivot_table(values="batting avg", columns="team", aggfunc=np.max)
249+
baseball.pivot_table(values="batting avg", columns="team", aggfunc="max")
250250
251251
For more details and examples see :ref:`the reshaping documentation
252252
<reshaping.pivot>`.
@@ -359,7 +359,7 @@ In pandas the equivalent expression, using the
359359
)
360360
361361
grouped = df.groupby(["month", "week"])
362-
grouped["x"].agg([np.mean, np.std])
362+
grouped["x"].agg(["mean", "std"])
363363
364364
365365
For more details and examples see :ref:`the groupby documentation
@@ -482,7 +482,7 @@ In Python the best way is to make use of :meth:`~pandas.pivot_table`:
482482
values="value",
483483
index=["variable", "week"],
484484
columns=["month"],
485-
aggfunc=np.mean,
485+
aggfunc="mean",
486486
)
487487
488488
Similarly for ``dcast`` which uses a data.frame called ``df`` in R to

doc/source/getting_started/comparison/comparison_with_sql.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ to your grouped DataFrame, indicating which functions to apply to specific colum
198198
199199
.. ipython:: python
200200
201-
tips.groupby("day").agg({"tip": np.mean, "day": np.size})
201+
tips.groupby("day").agg({"tip": "mean", "day": "size"})
202202
203203
Grouping by more than one column is done by passing a list of columns to the
204204
:meth:`~pandas.DataFrame.groupby` method.
@@ -222,7 +222,7 @@ Grouping by more than one column is done by passing a list of columns to the
222222
223223
.. ipython:: python
224224
225-
tips.groupby(["smoker", "day"]).agg({"tip": [np.size, np.mean]})
225+
tips.groupby(["smoker", "day"]).agg({"tip": ["size", "mean"]})
226226
227227
.. _compare_with_sql.join:
228228

doc/source/user_guide/basics.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -881,8 +881,8 @@ statistics methods, takes an optional ``axis`` argument:
881881

882882
.. ipython:: python
883883
884-
df.apply(np.mean)
885-
df.apply(np.mean, axis=1)
884+
df.apply(lambda x: np.mean(x))
885+
df.apply(lambda x: np.mean(x), axis=1)
886886
df.apply(lambda x: x.max() - x.min())
887887
df.apply(np.cumsum)
888888
df.apply(np.exp)
@@ -986,7 +986,7 @@ output:
986986

987987
.. ipython:: python
988988
989-
tsdf.agg(np.sum)
989+
tsdf.agg(lambda x: np.sum(x))
990990
991991
tsdf.agg("sum")
992992

doc/source/user_guide/cookbook.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -530,7 +530,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
530530
531531
code_groups = df.groupby("code")
532532
533-
agg_n_sort_order = code_groups[["data"]].transform(sum).sort_values(by="data")
533+
agg_n_sort_order = code_groups[["data"]].transform("sum").sort_values(by="data")
534534
535535
sorted_df = df.loc[agg_n_sort_order.index]
536536
@@ -549,7 +549,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
549549
return x.iloc[1] * 1.234
550550
return pd.NaT
551551
552-
mhc = {"Mean": np.mean, "Max": np.max, "Custom": MyCust}
552+
mhc = {"Mean": "mean", "Max": "max", "Custom": MyCust}
553553
ts.resample("5min").apply(mhc)
554554
ts
555555
@@ -685,7 +685,7 @@ The :ref:`Pivot <reshaping.pivot>` docs.
685685
values=["Sales"],
686686
index=["Province"],
687687
columns=["City"],
688-
aggfunc=np.sum,
688+
aggfunc="sum",
689689
margins=True,
690690
)
691691
table.stack("City")

doc/source/user_guide/reshaping.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -402,12 +402,12 @@ We can produce pivot tables from this data very easily:
402402
.. ipython:: python
403403
404404
pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"])
405-
pd.pivot_table(df, values="D", index=["B"], columns=["A", "C"], aggfunc=np.sum)
405+
pd.pivot_table(df, values="D", index=["B"], columns=["A", "C"], aggfunc="sum")
406406
pd.pivot_table(
407407
df, values=["D", "E"],
408408
index=["B"],
409409
columns=["A", "C"],
410-
aggfunc=np.sum,
410+
aggfunc="sum",
411411
)
412412
413413
The result object is a :class:`DataFrame` having potentially hierarchical indexes on the
@@ -451,7 +451,7 @@ rows and columns:
451451
columns="C",
452452
values=["D", "E"],
453453
margins=True,
454-
aggfunc=np.std
454+
aggfunc="std"
455455
)
456456
table
457457
@@ -552,7 +552,7 @@ each group defined by the first two :class:`Series`:
552552

553553
.. ipython:: python
554554
555-
pd.crosstab(df["A"], df["B"], values=df["C"], aggfunc=np.sum)
555+
pd.crosstab(df["A"], df["B"], values=df["C"], aggfunc="sum")
556556
557557
Adding margins
558558
~~~~~~~~~~~~~~
@@ -562,7 +562,7 @@ Finally, one can also add margins or normalize this output.
562562
.. ipython:: python
563563
564564
pd.crosstab(
565-
df["A"], df["B"], values=df["C"], aggfunc=np.sum, normalize=True, margins=True
565+
df["A"], df["B"], values=df["C"], aggfunc="sum", normalize=True, margins=True
566566
)
567567
568568
.. _reshaping.tile:

doc/source/user_guide/timeseries.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -1801,22 +1801,22 @@ You can pass a list or dict of functions to do aggregation with, outputting a ``
18011801

18021802
.. ipython:: python
18031803
1804-
r["A"].agg([np.sum, np.mean, np.std])
1804+
r["A"].agg(["sum", "mean", "std"])
18051805
18061806
On a resampled ``DataFrame``, you can pass a list of functions to apply to each
18071807
column, which produces an aggregated result with a hierarchical index:
18081808

18091809
.. ipython:: python
18101810
1811-
r.agg([np.sum, np.mean])
1811+
r.agg(["sum", "mean"])
18121812
18131813
By passing a dict to ``aggregate`` you can apply a different aggregation to the
18141814
columns of a ``DataFrame``:
18151815

18161816
.. ipython:: python
18171817
:okexcept:
18181818
1819-
r.agg({"A": np.sum, "B": lambda x: np.std(x, ddof=1)})
1819+
r.agg({"A": "sum", "B": lambda x: np.std(x, ddof=1)})
18201820
18211821
The function names can also be strings. In order for a string to be valid it
18221822
must be implemented on the resampled object:

doc/source/user_guide/window.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ of multiple aggregations applied to a window.
140140
.. ipython:: python
141141
142142
df = pd.DataFrame({"A": range(5), "B": range(10, 15)})
143-
df.expanding().agg([np.sum, np.mean, np.std])
143+
df.expanding().agg(["sum", "mean", "std"])
144144
145145
146146
.. _window.generic:

doc/source/whatsnew/v0.14.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -846,7 +846,7 @@ Enhancements
846846
df.pivot_table(values='Quantity',
847847
index=pd.Grouper(freq='M', key='Date'),
848848
columns=pd.Grouper(freq='M', key='PayDay'),
849-
aggfunc=np.sum)
849+
aggfunc="sum")
850850
851851
- Arrays of strings can be wrapped to a specified width (``str.wrap``) (:issue:`6999`)
852852
- Add :meth:`~Series.nsmallest` and :meth:`Series.nlargest` methods to Series, See :ref:`the docs <basics.nsorted>` (:issue:`3960`)

doc/source/whatsnew/v0.20.0.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -984,7 +984,7 @@ Previous behavior:
984984
75% 3.750000
985985
max 4.000000
986986
987-
In [3]: df.groupby('A').agg([np.mean, np.std, np.min, np.max])
987+
In [3]: df.groupby('A').agg(["mean", "std", "min", "max"])
988988
Out[3]:
989989
B
990990
mean std amin amax
@@ -1000,7 +1000,7 @@ New behavior:
10001000
10011001
df.groupby('A').describe()
10021002
1003-
df.groupby('A').agg([np.mean, np.std, np.min, np.max])
1003+
df.groupby('A').agg(["mean", "std", "min", "max"])
10041004
10051005
.. _whatsnew_0200.api_breaking.rolling_pairwise:
10061006

@@ -1163,7 +1163,7 @@ Previous behavior:
11631163

11641164
.. code-block:: ipython
11651165
1166-
In [2]: df.pivot_table('col1', index=['col3', 'col2'], aggfunc=np.sum)
1166+
In [2]: df.pivot_table('col1', index=['col3', 'col2'], aggfunc="sum")
11671167
Out[2]:
11681168
col3 col2
11691169
1 C 3
@@ -1175,7 +1175,7 @@ New behavior:
11751175

11761176
.. ipython:: python
11771177
1178-
df.pivot_table('col1', index=['col3', 'col2'], aggfunc=np.sum)
1178+
df.pivot_table('col1', index=['col3', 'col2'], aggfunc="sum")
11791179
11801180
.. _whatsnew_0200.api:
11811181

doc/source/whatsnew/v0.25.0.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ output columns when applying multiple aggregation functions to specific columns
4848
animals.groupby("kind").agg(
4949
min_height=pd.NamedAgg(column='height', aggfunc='min'),
5050
max_height=pd.NamedAgg(column='height', aggfunc='max'),
51-
average_weight=pd.NamedAgg(column='weight', aggfunc=np.mean),
51+
average_weight=pd.NamedAgg(column='weight', aggfunc="mean"),
5252
)
5353
5454
Pass the desired columns names as the ``**kwargs`` to ``.agg``. The values of ``**kwargs``
@@ -61,7 +61,7 @@ what the arguments to the function are, but plain tuples are accepted as well.
6161
animals.groupby("kind").agg(
6262
min_height=('height', 'min'),
6363
max_height=('height', 'max'),
64-
average_weight=('weight', np.mean),
64+
average_weight=('weight', 'mean'),
6565
)
6666
6767
Named aggregation is the recommended replacement for the deprecated "dict-of-dicts"

doc/source/whatsnew/v2.1.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -315,11 +315,11 @@ Deprecations
315315
- Deprecated option "mode.use_inf_as_na", convert inf entries to ``NaN`` before instead (:issue:`51684`)
316316
- Deprecated parameter ``obj`` in :meth:`GroupBy.get_group` (:issue:`53545`)
317317
- Deprecated positional indexing on :class:`Series` with :meth:`Series.__getitem__` and :meth:`Series.__setitem__`, in a future version ``ser[item]`` will *always* interpret ``item`` as a label, not a position (:issue:`50617`)
318+
- Deprecated replacing builtin and NumPy functions in ``.agg``, ``.apply``, and ``.transform``; use the corresponding string alias (e.g. ``"sum"`` for ``sum`` or ``np.sum``) instead (:issue:`53425`)
318319
- Deprecated strings ``T``, ``t``, ``L`` and ``l`` denoting units in :func:`to_timedelta` (:issue:`52536`)
319320
- Deprecated the "method" and "limit" keywords on :meth:`Series.fillna`, :meth:`DataFrame.fillna`, :meth:`SeriesGroupBy.fillna`, :meth:`DataFrameGroupBy.fillna`, and :meth:`Resampler.fillna`, use ``obj.bfill()`` or ``obj.ffill()`` instead (:issue:`53394`)
320321
- Deprecated the ``method`` and ``limit`` keywords in :meth:`DataFrame.replace` and :meth:`Series.replace` (:issue:`33302`)
321322
- Deprecated values "pad", "ffill", "bfill", "backfill" for :meth:`Series.interpolate` and :meth:`DataFrame.interpolate`, use ``obj.ffill()`` or ``obj.bfill()`` instead (:issue:`53581`)
322-
-
323323

324324
.. ---------------------------------------------------------------------------
325325
.. _whatsnew_210.performance:

pandas/core/apply.py

+22
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,7 @@ def agg(self) -> DataFrame | Series | None:
170170
if callable(func):
171171
f = com.get_cython_func(func)
172172
if f and not args and not kwargs:
173+
warn_alias_replacement(obj, func, f)
173174
return getattr(obj, f)()
174175

175176
# caller can react
@@ -280,6 +281,7 @@ def transform_str_or_callable(self, func) -> DataFrame | Series:
280281
if not args and not kwargs:
281282
f = com.get_cython_func(func)
282283
if f:
284+
warn_alias_replacement(obj, func, f)
283285
return getattr(obj, f)()
284286

285287
# Two possible ways to use a UDF - apply or call directly
@@ -1695,3 +1697,23 @@ def validate_func_kwargs(
16951697
no_arg_message = "Must provide 'func' or named aggregation **kwargs."
16961698
raise TypeError(no_arg_message)
16971699
return columns, func
1700+
1701+
1702+
def warn_alias_replacement(
1703+
obj: AggObjType,
1704+
func: Callable,
1705+
alias: str,
1706+
) -> None:
1707+
if alias.startswith("np."):
1708+
full_alias = alias
1709+
else:
1710+
full_alias = f"{type(obj).__name__}.{alias}"
1711+
alias = f"'{alias}'"
1712+
warnings.warn(
1713+
f"The provided callable {func} is currently using "
1714+
f"{full_alias}. In a future version of pandas, "
1715+
f"the provided callable will be used directly. To keep current "
1716+
f"behavior pass {alias} instead.",
1717+
category=FutureWarning,
1718+
stacklevel=find_stack_level(),
1719+
)

pandas/core/common.py

+7
Original file line numberDiff line numberDiff line change
@@ -565,6 +565,13 @@ def require_length_match(data, index: Index) -> None:
565565
builtins.min: np.minimum.reduce,
566566
}
567567

568+
# GH#53425: Only for deprecation
569+
_builtin_table_alias = {
570+
builtins.sum: "np.sum",
571+
builtins.max: "np.maximum.reduce",
572+
builtins.min: "np.minimum.reduce",
573+
}
574+
568575
_cython_table = {
569576
builtins.sum: "sum",
570577
builtins.max: "max",

pandas/core/frame.py

+7-7
Original file line numberDiff line numberDiff line change
@@ -8851,7 +8851,7 @@ def pivot(
88518851
it can contain any of the other types (except list). If an array is
88528852
passed, it must be the same length as the data and will be used in
88538853
the same manner as column values.
8854-
aggfunc : function, list of functions, dict, default numpy.mean
8854+
aggfunc : function, list of functions, dict, default "mean"
88558855
If a list of functions is passed, the resulting pivot table will have
88568856
hierarchical columns whose top level are the function names
88578857
(inferred from the function objects themselves).
@@ -8926,7 +8926,7 @@ def pivot(
89268926
This first example aggregates values by taking the sum.
89278927
89288928
>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
8929-
... columns=['C'], aggfunc=np.sum)
8929+
... columns=['C'], aggfunc="sum")
89308930
>>> table
89318931
C large small
89328932
A B
@@ -8938,7 +8938,7 @@ def pivot(
89388938
We can also fill missing values using the `fill_value` parameter.
89398939
89408940
>>> table = pd.pivot_table(df, values='D', index=['A', 'B'],
8941-
... columns=['C'], aggfunc=np.sum, fill_value=0)
8941+
... columns=['C'], aggfunc="sum", fill_value=0)
89428942
>>> table
89438943
C large small
89448944
A B
@@ -8950,7 +8950,7 @@ def pivot(
89508950
The next example aggregates by taking the mean across multiple columns.
89518951
89528952
>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
8953-
... aggfunc={'D': np.mean, 'E': np.mean})
8953+
... aggfunc={'D': "mean", 'E': "mean"})
89548954
>>> table
89558955
D E
89568956
A C
@@ -8963,8 +8963,8 @@ def pivot(
89638963
value column.
89648964
89658965
>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
8966-
... aggfunc={'D': np.mean,
8967-
... 'E': [min, max, np.mean]})
8966+
... aggfunc={'D': "mean",
8967+
... 'E': ["min", "max", "mean"]})
89688968
>>> table
89698969
D E
89708970
mean max mean min
@@ -9565,7 +9565,7 @@ def _gotitem(
95659565
Aggregate different functions over the columns and rename the index of the resulting
95669566
DataFrame.
95679567
9568-
>>> df.agg(x=('A', max), y=('B', 'min'), z=('C', np.mean))
9568+
>>> df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean'))
95699569
A B C
95709570
x 7.0 NaN NaN
95719571
y NaN 2.0 NaN

0 commit comments

Comments
 (0)