You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: web/pandas/pdeps/0013-standardize-apply.md
+19-13
Original file line number
Diff line number
Diff line change
@@ -10,21 +10,23 @@
10
10
11
11
Currently, giving an input to `Series.apply` is treated differently depending on the type of the input:
12
12
13
-
* if the input is a numpy `ufunc`, `series.apply(func)` is equivalent to `func(series)`, i.e. similar to `series.pipe(func)`.
14
-
* if the input is a callable, but not a numpy `ufunc`, `series.apply(func)` is similar to `Series([func(val) for val in series], index=series.index)`, i.e. similar to `series.map(func)`
15
-
* if the input is a list-like or dict-like, `series.apply(func)` is equivalent to `series.agg(func)` (which is subtly different than `series.apply`)
13
+
1. if the input is a numpy `ufunc`, `series.apply(func)` is equivalent to `func(series)`, i.e. similar to `series.pipe(func)`.
14
+
2. if the input is a callable, but not a numpy `ufunc`, `series.apply(func)` is similar to `Series([func(val) for val in series], index=series.index)`, i.e. similar to `series.map(func)`
15
+
3. if the input is a list-like or dict-like, `series.apply(func_list)` is equivalent to `series.agg(func_list)` (which is subtly different than `series.apply`)
16
16
17
17
In contrast, `DataFrame.apply` has a consistent behavior:
18
18
19
-
* if the input is a callable, `df.apply(func)` always calls each columns in the DataFrame, so is similar to `func(col) for _, col in
19
+
1. if the input is a callable, `df.apply(func)` always calls each columns in the DataFrame, so is similar to `func(col) for _, col in
20
20
df.items()` + wrapping functionality
21
-
* if the input is a list-like or dict-like, `df.apply` call each item in the list/dict and wraps the result as needed. So for example if the input is a list, `df.apply(func_list)` is equivalent to `[df.apply(func) for func in func_list]` + wrapping functionality
21
+
2. if the input is a list-like or dict-like, `df.apply` call each item in the list/dict and wraps the result as needed. So for example if the input is a list, `df.apply(func_list)` is equivalent to `[df.apply(func) for func in func_list]` + wrapping functionality
22
+
23
+
(it can be noted that `Series.apply` and `DataFrame.apply` already treat string input equivalently, so this proposal will not change how `Series.apply` treats string input. For background information, it can also be noted that `df.apply(..., axis=1)` will iterate over each row of frame dataframe, which is the expected behavior).
22
24
23
25
This PDEP proposes that:
24
26
25
-
- The current complex current behavior of `Series.apply` will be deprecated in Pandas 2.2.
27
+
- The current behavior of `Series.apply` described above will be deprecated in Pandas 2.2.
26
28
- Single callables given to the `.apply` methods of `Series` will in Pandas 3.0 always be called on the whole `Series`, so `series.apply(func)` will become similar to `func(series)`,
27
-
- Lists or dicts of callables given to the `Series.apply` will in Pandas 3.0 always call `Series.apply` on each element of the list/dict
29
+
- Lists or dicts given to the `Series.apply` will in Pandas 3.0 always applied using `Series.apply` on each element of the list/dict, instead of being equivalent to calling `Series.agg` on it, i.e. `series.apply(func_list)` will be equivalent to `[series.apply(func) for func in func_list]` + wrapping functionality.
28
30
29
31
In short, this PDEP proposes changing `Series.apply` to be more similar to how `DataFrame.apply` works on single dataframe columns, i.e. operate on the whole series. If a user wants to map a callable to each element of a Series, they should be directed to use `Series.map` instead of using `Series.apply`.
30
32
@@ -303,14 +305,15 @@ The result from the above change will be that `Series.apply` will operate simila
303
305
304
306
## Deprecation process
305
307
306
-
To change the behavior to the current behavior will have to be deprecated. This can be done by adding a `by_row` parameter to `Series.apply`, which means, when `by_rows=False`, that `Series.apply` will not operate elementwise but Series-wise.
307
-
308
-
So we will have in pandas v2.2:
308
+
To change the behavior to the current behavior will have to be deprecated. This can be done by adding a `by_row` parameter to `Series.apply`, so when `by_rows=True`, `Series.apply` will be backward compatible, and when `by_rows=False`, `Series.apply` will operate Series-wise. If the parameter is not set a warning will we emitted and the parameter will be set to `True`, i.e. be backward compatible. So we will have in pandas v2.2:
warn("Do not use the by_row parameter, it will be removed in the future")
326
+
warn("Do not use the by_row parameter, it will be removed in the future",
327
+
DeprecationWarning,
328
+
stacklevel=find_stack_level()
329
+
)
324
330
...
325
331
```
326
332
327
-
I.e. the `by_row` parameter will be needed in the signature in v3.0 in order be backward compatible with v2.x, but will have no effect.
333
+
I.e. the `by_row` parameter will be in the signature in v3.0 in order be backward compatible with v2.x, but will have no effect and will emit a warning if set in method calls.
328
334
329
335
In Pandas v4.0, the `by_row` parameter will be removed.
0 commit comments