Skip to content

Commit ba3b17c

Browse files
committed
fix wordings
1 parent 1bc54de commit ba3b17c

File tree

1 file changed

+12
-8
lines changed

1 file changed

+12
-8
lines changed

web/pandas/pdeps/0013-standardize-apply.md

+12-8
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
The `apply`, `transform` and `agg` methods have very complex behavior because they in some cases operate on elements in series, in some cases on series and sometimes try one first, and it that fails, falls back to try the other. There is not a logical system how these behaviors are arranged and it can therefore be difficult for users to understand these methods.
1212

13-
I propose to change how `apply`, `transform` and `agg` as follows:
13+
It is proposed that `apply`, `transform` and `agg` in the future will work as follows:
1414

1515
1. the `agg` & `transform` methods of `Series`, `DataFrame` & `groupby` will always operate series-wise and never element-wise
1616
2. `Series.apply` & `DataFrame.apply` will be deprecated.
@@ -22,13 +22,15 @@ The above changes means that the future behavior, when users want to apply arbit
2222
2. When users want to aggregate a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.agg`, `DataFrame.agg` and `groupby.agg` respectively.
2323
3. When users want to transform a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.transform`, `DataFrame.transform` and `groupby.transform` respectively.
2424

25-
The use of `Series.apply` & `DataFrame.apply` will after that change in almost all cases be replaced by one of the above methods. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results.
25+
The use of `Series.apply` & `DataFrame.apply` will after the proposed change in almost all cases be replaced by `map`, `agg` or `transform`. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it is proposed that it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results.
2626

2727
## Motivation
2828

2929
The current behavior of `apply`, `agg` & `transform` is very complex and therefore difficult to understand for non-expert users. The difficulty is especially that the methods sometimes apply callables on elements of series/dataframes, sometimes on Series or columns/rows of Dataframes and sometimes try element-wise operation and if that fails, falls back to series-wise operations.
3030

31-
Below is an overview of the current behavior in table form for `agg`, `transform` & `apply` ( The description may not be 100 % accurate because of various special cases in the current implementation, but will give a good understanding of the current behavior).
31+
Below is an overview of the current behavior in table form when giving callables to `agg`, `transform` & `apply`. As an example on how to read the tables, when a non-ufunc callable is given to `Series.agg`, `Series.agg` will first try to apply the callable to each element in the series, and if that fails, will fall back to call the series using the callable.
32+
33+
(The description may not be 100 % accurate because of various special cases in the current implementation, but will give a good understanding of the current behavior).
3234

3335
### agg
3436

@@ -100,7 +102,7 @@ The reason for the great performance difference is that `df.transform(func)` ope
100102

101103
In addition to the above effects of the current implementation of `agg`/`transform` & `apply`, see [#52140](https://github.com/pandas-dev/pandas/issues/52140) for more examples of the unexpected effects of how `apply` is implemented.
102104

103-
It can also be noted that `Series.apply` & `DataFrame.apply` could almost always be replaced with calls to `agg`, `transform` & `map`, if `agg` & `transform` were to always operate on series data. For some examples, see the table below for alternative methodt to `apply(func)`:
105+
It can also be noted that `Series.apply` & `DataFrame.apply` could almost always be replaced with calls to `agg`, `transform` & `map`, if `agg` & `transform` were to always operate on series data. For some examples, see the table below for alternatives using `apply(func)`:
104106

105107
| func | Series | DataFrame |
106108
|:--------------------|:-----------|:------------|
@@ -115,7 +117,9 @@ Because of their flexibility, `Series.apply` & `DataFrame.apply` are considered
115117

116118
With the above in mind, it is proposed that in the future:
117119

118-
1. the `agg` & `transform` methods of `Series`, `DataFrame` will always operate series-wise and never element-wise
120+
It is proposed that `apply`, `transform` and `agg` in the future will work as follows:
121+
122+
1. the `agg` & `transform` methods of `Series`, `DataFrame` & `groupby` will always operate series-wise and never element-wise
119123
2. `Series.apply` & `DataFrame.apply` will be deprecated.
120124
3. `groupby.apply` will not be deprecated (because it behaves differently than `Series.apply` & `DataFrame.apply`)
121125

@@ -125,7 +129,9 @@ The above changes means that the future behavior, when users want to apply arbit
125129
2. When users want to aggregate a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.agg`, `DataFrame.agg` and `groupby.agg` respectively.
126130
3. When users want to transform a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.transform`, `DataFrame.transform` and `groupby.transform` respectively.
127131

128-
The use of `Series.apply` & `DataFrame.apply` will after that change in almost all cases be replaced by `map`, `agg` or `transform`, so will be deprecated. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results.
132+
The use of `Series.apply` & `DataFrame.apply` will after the proposed change in almost all cases be replaced by `map`, `agg` or `transform`. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it is proposed that it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results.
133+
134+
It can be noted that `groupby.agg`, `groupby.transform` & `groupby.apply` are not proposed changed in this PDEP, because `groupby.agg`, `groupby.transform` already behave as desired and `groupby.apply` behaves differently than `Series.apply` & `DataFrame.apply`. Likewise, the behavior when given ufuncs will remain unchanged, because the behavior is already as intended in all cases.
129135

130136
## Deprecation process
131137

@@ -134,8 +140,6 @@ To change the current behavior, it will have to be deprecated. This will be done
134140
1. Deprecate `Series.apply` & `DataFrame.apply`.
135141
2. Add a `series_ops_only` with type `bool | lib.NoDefault` parameter to `agg` & `transform` methods of `Series` & `DataFrame`. When `series_ops_only` is set to False, `agg` & `transform` will behave as they do currently. When set to True, `agg` & `transform` will never operate on elements, but always on Series. When set to `no_default`, `agg` & `transform` will behave as `series_ops_only=False`, but will emit a FutureWarning the current behavior will be reoved in the future.
136142

137-
(It can be noted that `groupby.agg`, `groupby.transform` & `groupby.apply` are not proposed changed in this PDEP, because `groupby.agg`, `groupby.transform` already behave as desired and `groupby.apply` behaves differently than `Series.apply` & `DataFrame.apply`)
138-
139143
In Pandas v3.0:
140144
1. `Series.apply` & `DataFrame.apply` will be removed from the code base (question: or added to `_hidden_attrs`?).
141145
1. The `agg` & `transform` will always operate on series/columns/rows data and the `series_ops_only` parameter will have no effect and be deprecated and removed in v4.0 (it must be kept in v3.x in order to facilitate the switch from v2.x to v3.0).

0 commit comments

Comments
 (0)