You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: web/pandas/pdeps/0013-standardize-apply.md
+12-8
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@
10
10
11
11
The `apply`, `transform` and `agg` methods have very complex behavior because they in some cases operate on elements in series, in some cases on series and sometimes try one first, and it that fails, falls back to try the other. There is not a logical system how these behaviors are arranged and it can therefore be difficult for users to understand these methods.
12
12
13
-
I propose to change how `apply`, `transform` and `agg` as follows:
13
+
It is proposed that `apply`, `transform` and `agg` in the future will work as follows:
14
14
15
15
1. the `agg` & `transform` methods of `Series`, `DataFrame` & `groupby` will always operate series-wise and never element-wise
16
16
2.`Series.apply` & `DataFrame.apply` will be deprecated.
@@ -22,13 +22,15 @@ The above changes means that the future behavior, when users want to apply arbit
22
22
2. When users want to aggregate a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.agg`, `DataFrame.agg` and `groupby.agg` respectively.
23
23
3. When users want to transform a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.transform`, `DataFrame.transform` and `groupby.transform` respectively.
24
24
25
-
The use of `Series.apply` & `DataFrame.apply` will after that change in almost all cases be replaced by one of the above methods. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results.
25
+
The use of `Series.apply` & `DataFrame.apply` will after the proposed change in almost all cases be replaced by `map`, `agg` or `transform`. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it is proposed that it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results.
26
26
27
27
## Motivation
28
28
29
29
The current behavior of `apply`, `agg` & `transform` is very complex and therefore difficult to understand for non-expert users. The difficulty is especially that the methods sometimes apply callables on elements of series/dataframes, sometimes on Series or columns/rows of Dataframes and sometimes try element-wise operation and if that fails, falls back to series-wise operations.
30
30
31
-
Below is an overview of the current behavior in table form for `agg`, `transform` & `apply` ( The description may not be 100 % accurate because of various special cases in the current implementation, but will give a good understanding of the current behavior).
31
+
Below is an overview of the current behavior in table form when giving callables to `agg`, `transform` & `apply`. As an example on how to read the tables, when a non-ufunc callable is given to `Series.agg`, `Series.agg` will first try to apply the callable to each element in the series, and if that fails, will fall back to call the series using the callable.
32
+
33
+
(The description may not be 100 % accurate because of various special cases in the current implementation, but will give a good understanding of the current behavior).
32
34
33
35
### agg
34
36
@@ -100,7 +102,7 @@ The reason for the great performance difference is that `df.transform(func)` ope
100
102
101
103
In addition to the above effects of the current implementation of `agg`/`transform` & `apply`, see [#52140](https://github.com/pandas-dev/pandas/issues/52140) for more examples of the unexpected effects of how `apply` is implemented.
102
104
103
-
It can also be noted that `Series.apply` & `DataFrame.apply` could almost always be replaced with calls to `agg`, `transform` & `map`, if `agg` & `transform` were to always operate on series data. For some examples, see the table below for alternative methodt to`apply(func)`:
105
+
It can also be noted that `Series.apply` & `DataFrame.apply` could almost always be replaced with calls to `agg`, `transform` & `map`, if `agg` & `transform` were to always operate on series data. For some examples, see the table below for alternatives using`apply(func)`:
@@ -115,7 +117,9 @@ Because of their flexibility, `Series.apply` & `DataFrame.apply` are considered
115
117
116
118
With the above in mind, it is proposed that in the future:
117
119
118
-
1. the `agg` & `transform` methods of `Series`, `DataFrame` will always operate series-wise and never element-wise
120
+
It is proposed that `apply`, `transform` and `agg` in the future will work as follows:
121
+
122
+
1. the `agg` & `transform` methods of `Series`, `DataFrame` & `groupby` will always operate series-wise and never element-wise
119
123
2.`Series.apply` & `DataFrame.apply` will be deprecated.
120
124
3.`groupby.apply` will not be deprecated (because it behaves differently than `Series.apply` & `DataFrame.apply`)
121
125
@@ -125,7 +129,9 @@ The above changes means that the future behavior, when users want to apply arbit
125
129
2. When users want to aggregate a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.agg`, `DataFrame.agg` and `groupby.agg` respectively.
126
130
3. When users want to transform a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.transform`, `DataFrame.transform` and `groupby.transform` respectively.
127
131
128
-
The use of `Series.apply` & `DataFrame.apply` will after that change in almost all cases be replaced by `map`, `agg` or `transform`, so will be deprecated. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results.
132
+
The use of `Series.apply` & `DataFrame.apply` will after the proposed change in almost all cases be replaced by `map`, `agg` or `transform`. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it is proposed that it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results.
133
+
134
+
It can be noted that `groupby.agg`, `groupby.transform` & `groupby.apply` are not proposed changed in this PDEP, because `groupby.agg`, `groupby.transform` already behave as desired and `groupby.apply` behaves differently than `Series.apply` & `DataFrame.apply`. Likewise, the behavior when given ufuncs will remain unchanged, because the behavior is already as intended in all cases.
129
135
130
136
## Deprecation process
131
137
@@ -134,8 +140,6 @@ To change the current behavior, it will have to be deprecated. This will be done
134
140
1. Deprecate `Series.apply` & `DataFrame.apply`.
135
141
2. Add a `series_ops_only` with type `bool | lib.NoDefault` parameter to `agg` & `transform` methods of `Series` & `DataFrame`. When `series_ops_only` is set to False, `agg` & `transform` will behave as they do currently. When set to True, `agg` & `transform` will never operate on elements, but always on Series. When set to `no_default`, `agg` & `transform` will behave as `series_ops_only=False`, but will emit a FutureWarning the current behavior will be reoved in the future.
136
142
137
-
(It can be noted that `groupby.agg`, `groupby.transform` & `groupby.apply` are not proposed changed in this PDEP, because `groupby.agg`, `groupby.transform` already behave as desired and `groupby.apply` behaves differently than `Series.apply` & `DataFrame.apply`)
138
-
139
143
In Pandas v3.0:
140
144
1.`Series.apply` & `DataFrame.apply` will be removed from the code base (question: or added to `_hidden_attrs`?).
141
145
1. The `agg` & `transform` will always operate on series/columns/rows data and the `series_ops_only` parameter will have no effect and be deprecated and removed in v4.0 (it must be kept in v3.x in order to facilitate the switch from v2.x to v3.0).
0 commit comments