From 1787286ecd437c4211303cd8b3e1aa767cc2f028 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Fri, 25 Aug 2023 07:14:35 +0100 Subject: [PATCH 01/13] PDEP-13: Make the Series.apply method operate Series-wise --- web/pandas/pdeps/0013-standardize-apply.md | 333 +++++++++++++++++++++ 1 file changed, 333 insertions(+) create mode 100644 web/pandas/pdeps/0013-standardize-apply.md diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md new file mode 100644 index 0000000000000..6c52a1e299aa3 --- /dev/null +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -0,0 +1,333 @@ +# PDEP-13: Make the Series.apply method operate Series-wise + +- Created: 24 August 2023 +- Status: Under discussion +- Discussion: [#52140](https://github.com/pandas-dev/pandas/issues/52509) +- Author: [Terji Petersen](https://github.com/topper-123) +- Revision: 1 + +## Abstract + +Currently, giving a input to `Series.apply` is treated differently depending on the type of the input: + +* if the input is a numpy `ufunc`, `series.apply(func)` is equivalent to `func(series)`, i.e. similar to `series.pipe(func)`. +* if the input is a callable, but not a numpy `ufunc`, `series.apply(func)` is similar to `Series([func(val) for val in series], index=series.index)`, i.e. similar to `series.map(func)` +* if the input is a list-like or dict-like, `series.apply(func)` is equivalent to `series.agg(func)` (which is subtly different than `series.apply`) + +In contrast, `DataFrame.apply` has a consistent behavior: + +* if the input is a callable, `df.apply(func)` always calls each columns in the DataFrame, so is similar to `func(col) for _, col in +df.items()` + wrapping functionality +* if the input is a list-like or dict-like, `df.apply` call each item in the list/dict and wraps the result as needed. So for example if the input is a list, `df.apply(func_list)` is equivalent to `[df.apply(func) for func in func_list]` + wrapping functionality + +This PDEP proposes that: + +- The current complex current behavior of `Series.apply` will be deprecated in Pandas 2.2. +- Single callables given to the `.apply` methods of `Series` will in Pandas 3.0 always be called on the whole `Series`, so `series.apply(func)` will become similar to `func(series)`, +- Lists or dicts of callables given to the `Series.apply` will in Pandas 3.0 always call `Series.apply` on each element of the list/dict + +In short, this PDEP proposes changing `Series.apply` to be more similar to how `DataFrame.apply` works on single dataframe columns, i.e. operate on the whole series. If a user wants to map a callable to each element of a Series, they should be directed to use `Series.map` instead of using `Series.apply`. + +## Motivation + +`Series.apply` is currently a very complex method, whose behaviour will differ depending on the nature of its input. + +`Series.apply` & `Series.map` currently often behave very similar, but differently enough for it to be confusing when it's a good idea to use one over the other and especially when `Series.apply` is a bad idea to use. + +Also, calling `Series.apply` currently gives a different result than the per-column result from calling `DataFrame.apply`, which can be confusing for users who expect `Series.apply` to be the `Series` version of `DataFrame.apply`, similar to how `Series.agg` is the `Series` version of `DataFrame.agg`. For example, currently some functions may work fine with `DataFrame.apply`, but may fail, be very slow when given to `Series.apply` or give a different result than the per-column result from `DataFrame.apply`. + +### Similarities and differences between `Series.apply` and `Series.map` + +The similarity between the methods is especially that they both fall back to use `Series._map_values` and there use `algorithms.map_array` or `ExtensionArray.map` as relevant. + +The differences are many, but each one is relative minor: + +1. `Series.map` has a `na_action` parameter, which `Series.apply` doesn't +2. `Series.apply` can take advantage of numpy ufuncs, which `Series.map` can't +3. `Series.apply` can take `args` and `**kwargs`, which `Series.map` can't +4. `Series.apply` is more general and can take a string, e.g. `"sum"`, or lists or dicts of inputs which `Series.map` can't. +5. when given a numpy ufunc, the ufunc will be called on the whole Series, when given to `Series.apply` and on each element of the series, if given to `Series.map`. + +In addition, `Series.apply` has some functionality, which `Series.map` does not, but which has already been deprecated: + +6. `Series.apply` has a `convert_dtype` parameter, which has been deprecated (deprecated in pandas 2.1, see [GH52257](https://github.com/pandas-dev/pandas/pull/52257)) +7. `Series.apply` will return a Dataframe, if its result is a list of Series (deprecated in pandas 2.1, see [GH52123]()https://github.com/pandas-dev/pandas/pull/52123)). + +### Similarities and differences between `Series.apply` and `DataFrame.apply` + +`Series.apply` and `DataFrame.apply` are similar when given numpy ufuncs as inputs, but when given non-ufuncs as inputs, `Series.apply` and `DataFrame.apply` will behave differently, because `series.apply(func)` will be similar to `series.map(func)` while `Dataframe.apply(func)` will call the input on each column series and combine the result. + +If given a list-like or dict-like, `Series.apply` will behave similar to `Series.agg`, while `DataFrame.apply` will call each element in the list-like/dict-like on each column and combine the results. + +Also `DataFrame.apply` has some parameters (`raw` and `result_type`) which are relevant for a 2D DataFrame, but may not be relevant for `Series.apply`, because `Series` is a 1D structure. + +## Examples of problems with the current way `Series.apply` works + +The above similarities and many minor differences makes for confusing and too complex rules for when its a good idea to use `Series.apply` over `Series.map` to do operations, and vica versa, and for when a callable will work well with `Series.apply` versus `DataFrame.apply`. Some examples will show some examples below. + +First some setup: + +```python +>>> import numpy as np +>>> import pandas as pd +>>> +>>> small_ser = pd.Series([1, 2, 3]) +>>> large_ser = pd.Series(range(100_000)) +``` + +### 1: string vs numpy funcs in `Series.apply` + +```python +>>> small_ser.apply("sum") +6 +>>> small_ser.apply(np.sum) +0 1 +1 2 +2 3 +dtype: int64 +``` + +It will surprise users that these two give different results. Also, anyone using the second pattern is probably making a mistake. + +Note that giving `np.sum` to `DataFrame.apply` aggregates properly: + +```python +>>> pd.DataFrame(small_ser).apply(np.sum) +0 6 +dtype: int64 +``` + +This PDEP proposes that callables will be applies to the whole `Series`, so we in the future will have: + +```python +>>> small_ser.apply(np.sum) +6 +``` + +### 2 Callables vs. list/dict of callables + +Giving functions and lists/dicts of functions will give different results: + +```python +>>> small_ser.apply(np.sum) +0 1 +1 2 +2 3 +dtype: int64 +>>> small_ser.apply([np.sum]) +sum 6 +dtype: int64 +``` + +Also with non-numpy callables: + +```python +>>> small_ser.apply(lambda x: x.sum()) +AttributeError: 'int' object has no attribute 'sum' +>>> small_ser.apply([lambda x: x.sum()]) + 6 +dtype: int64 +``` + +In both cases above the difference is that `Series.apply` operates element-wise, if given a callable, but series-wise if given a list/dict of callables. + +This PDEP proposes that callables will be applies to the whole `Series`, so we in the future will have: + +```python +>>> small_ser.apply(lambda x: x.sum()) +6 +>>> small_ser.apply([lambda x: x.sum()]) + 6 +dtype: int64 +``` + +### 3. Functions in `Series.apply` + +The `Series.apply` doc string have examples with using lambdas, but using lambdas in `Series.apply` is often a bad practices because of bad performance: + +```python +>>> %timeit large_ser.apply(lambda x: x + 1) +24.1 ms ± 88.8 µs per loop +``` + +Currently, `Series` does not have a method that makes a callable operate on a series' data. Instead users need to use `Series.pipe` for that operation in order for the operation to be efficient: + +```python +>>> %timeit large_ser.pipe(lambda x: x + 1) +44 µs ± 363 ns per loop +``` + +(The reason for the above performance differences is that apply gets called on each single element, while `pipe` calls `x.__add__(1)`, which operates on the whole array). + +Note also that `.pipe` operates on the `Series` while `apply`currently operates on each element in the data, so there is some differences that may have some consequence in some cases. + +This PDEP proposes that callables will be applies to the whole `Series`, so we in the future `Series.apply` will be as fast as `Series.pipe`. + +### 4. ufuncs in `Series.apply` vs. noral functions + +Performance-wise, ufuncs are fine in `Series.apply`, but non-ufunc functions are not: + +```python +>>> %timeit large_ser.apply(np.sqrt) +71.6 µs ± 1.17 µs per loop +>>> %timeit large_ser.apply(lambda x:np.sqrt(x)) +63.6 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) +``` + +It is difficult to understand why ufuncs are fast in `apply`, while other callables are slow in `apply` (answer: it's because ufuncs operate on the whole Series, while other callables operate elementwise). + +This PDEP proposes that callables will be applies to the whole `Series`, so we in the future non-ufunc functions in `Series.apply` will be as fast as ufuncs. + +### 5. callables in `Series.apply` is slow, callables in `DataFrame.apply` is fast + +Above it was shown that using (non-ufunc) callables in `Series.apply` is bad performance-wise. OTOH using them in `DataFrame.apply` is fine: + +```python +>>> %timeit large_ser.apply(lambda x: x + 1) +24.3 ms ± 24 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) +>>> %timeit pd.DataFrame(large_ser).apply(lambda x: x + 1) +160 µs +- 1.17 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) +``` + +Having callables being fast to use in the `DataFrame.apply` method, but slow in `Series.apply` is confusing for users. + +This PDEP proposes that callables will be applies to the whole `Series`, so we in the future `Series.apply` will be as fast as `DataFrame.apply` already is. + +### 6. callables in `Series.apply` may fail, while callables in `DataFrame.apply` do not and vica versa + +```python +>>> ser.apply(lambda x: x.sum()) +AttributeError: 'int' object has no attribute 'sum' +>>> pd.DataFrame(ser).apply(lambda x: x.sum()) +0 6 +dtype: int64 +``` + +Having callables fail when used in `Series.apply`, but work in `DataFrame.Apply` or vica versa is confusing for users. + +This PDEP proposes that callables will be applies to the whole `Series`, so callables given to `Series.apply` will work the same as when given to `DataFrame.apply`, so in the future we will have that: + +```python +>>> ser.apply(lambda x: x.sum()) +6 +>>> pd.DataFrame(ser).apply(lambda x: x.sum()) +0 6 +dtype: int64 +``` + +### 7. `Series.apply` vs. `Series.agg` + +The doc string for `Series.agg` says about the method's `func` parameter: "If a function, must ... work when passed ... to Series.apply". But compare these: + +```python +>>> small_ser.agg(np.sum) +6 +>>> small_ser.apply(np.sum) +0 1 +1 2 +2 3 +dtype: int64 +``` + +Users would expect these two to give the same result. + +This PDEP proposes that callables will be applies to the whole `Series`, so in the future we will have: + +```python +>>> small_ser.agg(np.sum) +6 +>>> small_ser.apply(np.sum) +6 +``` + +### 8. dictlikes vs. listlikes in `Series.apply` + +Giving a *list* of transforming arguments to `Series.apply` returns a `DataFrame`: + +```python +>>> small_ser.apply(["sqrt", np.abs]) + sqrt absolute +0 1.000000 1 +1 1.414214 2 +2 1.732051 3 +``` + +But giving a *dict* of transforming arguments returns a `Series` with a `MultiIndex`: + +```python +>>> small_ser.apply({"sqrt" :"sqrt", "abs" : np.abs}) +sqrt 0 1.000000 + 1 1.414214 + 2 1.732051 +abs 0 1.000000 + 1 2.000000 + 2 3.000000 +dtype: float64 +``` + +These two should give same-shaped output for consistency. Using `Series.transform` instead of `Series.apply`, it returns a `DataFrame` in both cases and I think the dictlike example above should return a `DataFrame` similar to the listlike example. + +Minor additional info: listlikes and dictlikes of aggregation arguments do behave the same, so this is only a problem with dictlikes of transforming arguments when using `apply`. + +This PDEP proposes that the result from giving list-likes and dict-likes to `Series.apply` will have the same shape as when given list-likes currently: + +```python +>>> small_ser.apply(["sqrt", np.abs]) + sqrt absolute +0 1.000000 1 +1 1.414214 2 +2 1.732051 3 +>>> small_ser.apply({"sqrt" :"sqrt", "abs" : np.abs}) + sqrt absolute +0 1.000000 1 +1 1.414214 2 +2 1.732051 3 +``` + +## Proposal + +With the above in mind, it is proposed that: + +1. When given a callable, `Series.apply` always operate on the series. I.e. let `series.apply(func)` be similar to `func(series)` + the needed additional functionality. +2. When given a list-like or dict-like, `Series.apply` will apply each element of the list-like/dict-like to the series. I.e. `series.apply(func_list)` wil be similar to `[series.apply(func) for func in func_list]` + the needed additional functionality +3. The changes made to `Series.apply`will propagate to `Series.agg` and `Series.transform` as needed. + +The difference between `Series.apply()` & `Series.map()` will then be that: + +* `Series.apply()` makes the passed-in callable operate on the series, similarly to how `(DataFrame|SeriesGroupby|DataFrameGroupBy).apply` operate on series. This is very fast and can do almost anything, +* `Series.map()` makes the passed-in callable operate on each series data elements individually. This is very flexible, but can be very slow, so should only be used if `Series.apply` can't do it. + +so, this API change will help make Pandas `Series.(apply|map)` API clearer without losing functionality and let their functionality be explainable in a simple manner, which would be a win for Pandas. + +The result from the above change will be that `Series.apply` will operate similar to how `DataFrame.apply` works already per column, similar to how `Series.map` operates similar to how `DataFrame.map` works per column. This will give better coherence between same-named methods on `DataFrames` and `Series`. + +## Deprecation process + +To change the behavior to the current behavior will have to be deprecated. This can be done by adding a `by_row` parameter to `Series.apply`, which means, when `by_rows=False`, that `Series.apply` will not operate elementwise but Series-wise. + +So we will have in pandas v2.2: + +```python +>>> def apply(self, ..., by_row: bool | NoDefault=no_default, ...): + if by_row is no_default: + warn("The by_row parameter will be set to False in the future") + by_row = True + ... +``` + +In pandas v3.0 the signature will change to: + +```python +>>> def apply(self, ..., by_row: NoDefault=no_default, ...): + if by_row is not no_default: + warn("Do not use the by_row parameter, it will be removed in the future") + ... +``` + +I.e. the `by_row` parameter will be needed in the signature in v3.0 in order be backward compatible with v2.x, but will have no effect. + +In Pandas v4.0, the `by_row` parameter will be removed. + +## PDEP-13 History + +- 24 august 2023: Initial version From 6ae4031a4852efd397acc22e374d4aacbfcdad5e Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Fri, 25 Aug 2023 07:58:39 +0100 Subject: [PATCH 02/13] fix codespell issues --- web/pandas/pdeps/0013-standardize-apply.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index 6c52a1e299aa3..3b2dfdc6acb41 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -163,7 +163,7 @@ Note also that `.pipe` operates on the `Series` while `apply`currently operates This PDEP proposes that callables will be applies to the whole `Series`, so we in the future `Series.apply` will be as fast as `Series.pipe`. -### 4. ufuncs in `Series.apply` vs. noral functions +### 4. ufuncs in `Series.apply` vs. normal functions Performance-wise, ufuncs are fine in `Series.apply`, but non-ufunc functions are not: @@ -289,7 +289,7 @@ This PDEP proposes that the result from giving list-likes and dict-likes to `Ser With the above in mind, it is proposed that: 1. When given a callable, `Series.apply` always operate on the series. I.e. let `series.apply(func)` be similar to `func(series)` + the needed additional functionality. -2. When given a list-like or dict-like, `Series.apply` will apply each element of the list-like/dict-like to the series. I.e. `series.apply(func_list)` wil be similar to `[series.apply(func) for func in func_list]` + the needed additional functionality +2. When given a list-like or dict-like, `Series.apply` will apply each element of the list-like/dict-like to the series. I.e. `series.apply(func_list)` will be similar to `[series.apply(func) for func in func_list]` + the needed additional functionality 3. The changes made to `Series.apply`will propagate to `Series.agg` and `Series.transform` as needed. The difference between `Series.apply()` & `Series.map()` will then be that: From 6cdd5e9408197c08349d57f4baafbc5a34810cf9 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Sun, 27 Aug 2023 07:14:42 +0100 Subject: [PATCH 03/13] Update web/pandas/pdeps/0013-standardize-apply.md Co-authored-by: Irv Lustig --- web/pandas/pdeps/0013-standardize-apply.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index 3b2dfdc6acb41..403c4adefe6ff 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -8,7 +8,7 @@ ## Abstract -Currently, giving a input to `Series.apply` is treated differently depending on the type of the input: +Currently, giving an input to `Series.apply` is treated differently depending on the type of the input: * if the input is a numpy `ufunc`, `series.apply(func)` is equivalent to `func(series)`, i.e. similar to `series.pipe(func)`. * if the input is a callable, but not a numpy `ufunc`, `series.apply(func)` is similar to `Series([func(val) for val in series], index=series.index)`, i.e. similar to `series.map(func)` From 87c1faadf71ca77229b7f5ad73f98f837941679e Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Sun, 27 Aug 2023 08:22:38 +0100 Subject: [PATCH 04/13] Update web/pandas/pdeps/0013-standardize-apply.md Co-authored-by: Irv Lustig --- web/pandas/pdeps/0013-standardize-apply.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index 403c4adefe6ff..2edff3742948b 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -205,7 +205,7 @@ dtype: int64 Having callables fail when used in `Series.apply`, but work in `DataFrame.Apply` or vica versa is confusing for users. -This PDEP proposes that callables will be applies to the whole `Series`, so callables given to `Series.apply` will work the same as when given to `DataFrame.apply`, so in the future we will have that: +This PDEP proposes that callables will be applied to the whole `Series`, so callables given to `Series.apply` will work the same as when given to `DataFrame.apply`, so in the future we will have that: ```python >>> ser.apply(lambda x: x.sum()) From 22414df447cb184c289b6fb13d4526ff588de3b5 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Sun, 27 Aug 2023 08:35:54 +0100 Subject: [PATCH 05/13] Update web/pandas/pdeps/0013-standardize-apply.md Co-authored-by: Irv Lustig --- web/pandas/pdeps/0013-standardize-apply.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index 2edff3742948b..6ff9a09412c31 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -231,7 +231,7 @@ dtype: int64 Users would expect these two to give the same result. -This PDEP proposes that callables will be applies to the whole `Series`, so in the future we will have: +This PDEP proposes that callables will be applied to the whole `Series`, so in the future we will have: ```python >>> small_ser.agg(np.sum) From c5736ab3bc8f9420ee23b79578c26b3223e46db1 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Sun, 27 Aug 2023 09:13:07 +0100 Subject: [PATCH 06/13] update PDEP --- web/pandas/pdeps/0013-standardize-apply.md | 32 +++++++++++++--------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index 6ff9a09412c31..79529cb6edc53 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -10,21 +10,23 @@ Currently, giving an input to `Series.apply` is treated differently depending on the type of the input: -* if the input is a numpy `ufunc`, `series.apply(func)` is equivalent to `func(series)`, i.e. similar to `series.pipe(func)`. -* if the input is a callable, but not a numpy `ufunc`, `series.apply(func)` is similar to `Series([func(val) for val in series], index=series.index)`, i.e. similar to `series.map(func)` -* if the input is a list-like or dict-like, `series.apply(func)` is equivalent to `series.agg(func)` (which is subtly different than `series.apply`) +1. if the input is a numpy `ufunc`, `series.apply(func)` is equivalent to `func(series)`, i.e. similar to `series.pipe(func)`. +2. if the input is a callable, but not a numpy `ufunc`, `series.apply(func)` is similar to `Series([func(val) for val in series], index=series.index)`, i.e. similar to `series.map(func)` +3. if the input is a list-like or dict-like, `series.apply(func_list)` is equivalent to `series.agg(func_list)` (which is subtly different than `series.apply`) In contrast, `DataFrame.apply` has a consistent behavior: -* if the input is a callable, `df.apply(func)` always calls each columns in the DataFrame, so is similar to `func(col) for _, col in +1. if the input is a callable, `df.apply(func)` always calls each columns in the DataFrame, so is similar to `func(col) for _, col in df.items()` + wrapping functionality -* if the input is a list-like or dict-like, `df.apply` call each item in the list/dict and wraps the result as needed. So for example if the input is a list, `df.apply(func_list)` is equivalent to `[df.apply(func) for func in func_list]` + wrapping functionality +2. if the input is a list-like or dict-like, `df.apply` call each item in the list/dict and wraps the result as needed. So for example if the input is a list, `df.apply(func_list)` is equivalent to `[df.apply(func) for func in func_list]` + wrapping functionality + +(it can be noted that `Series.apply` and `DataFrame.apply` already treat string input equivalently, so this proposal will not change how `Series.apply` treats string input. For background information, it can also be noted that `df.apply(..., axis=1)` will iterate over each row of frame dataframe, which is the expected behavior). This PDEP proposes that: -- The current complex current behavior of `Series.apply` will be deprecated in Pandas 2.2. +- The current behavior of `Series.apply` described above will be deprecated in Pandas 2.2. - Single callables given to the `.apply` methods of `Series` will in Pandas 3.0 always be called on the whole `Series`, so `series.apply(func)` will become similar to `func(series)`, -- Lists or dicts of callables given to the `Series.apply` will in Pandas 3.0 always call `Series.apply` on each element of the list/dict +- Lists or dicts given to the `Series.apply` will in Pandas 3.0 always applied using `Series.apply` on each element of the list/dict, instead of being equivalent to calling `Series.agg` on it, i.e. `series.apply(func_list)` will be equivalent to `[series.apply(func) for func in func_list]` + wrapping functionality. In short, this PDEP proposes changing `Series.apply` to be more similar to how `DataFrame.apply` works on single dataframe columns, i.e. operate on the whole series. If a user wants to map a callable to each element of a Series, they should be directed to use `Series.map` instead of using `Series.apply`. @@ -303,14 +305,15 @@ The result from the above change will be that `Series.apply` will operate simila ## Deprecation process -To change the behavior to the current behavior will have to be deprecated. This can be done by adding a `by_row` parameter to `Series.apply`, which means, when `by_rows=False`, that `Series.apply` will not operate elementwise but Series-wise. - -So we will have in pandas v2.2: +To change the behavior to the current behavior will have to be deprecated. This can be done by adding a `by_row` parameter to `Series.apply`, so when `by_rows=True`, `Series.apply` will be backward compatible, and when `by_rows=False`, `Series.apply` will operate Series-wise. If the parameter is not set a warning will we emitted and the parameter will be set to `True`, i.e. be backward compatible. So we will have in pandas v2.2: ```python >>> def apply(self, ..., by_row: bool | NoDefault=no_default, ...): if by_row is no_default: - warn("The by_row parameter will be set to False in the future") + warn("The by_row parameter will be set to False in the future", + DeprecationWarning, + stacklevel=find_stack_level() + ) by_row = True ... ``` @@ -320,11 +323,14 @@ In pandas v3.0 the signature will change to: ```python >>> def apply(self, ..., by_row: NoDefault=no_default, ...): if by_row is not no_default: - warn("Do not use the by_row parameter, it will be removed in the future") + warn("Do not use the by_row parameter, it will be removed in the future", + DeprecationWarning, + stacklevel=find_stack_level() + ) ... ``` -I.e. the `by_row` parameter will be needed in the signature in v3.0 in order be backward compatible with v2.x, but will have no effect. +I.e. the `by_row` parameter will be in the signature in v3.0 in order be backward compatible with v2.x, but will have no effect and will emit a warning if set in method calls. In Pandas v4.0, the `by_row` parameter will be removed. From 809a27f14e1db561d72897b0133d7c218dc0d9d1 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Sun, 17 Sep 2023 07:54:25 +0100 Subject: [PATCH 07/13] v2 --- web/pandas/pdeps/0013-standardize-apply.md | 369 +++++---------------- 1 file changed, 88 insertions(+), 281 deletions(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index 79529cb6edc53..231d38ccc1edb 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -1,339 +1,146 @@ -# PDEP-13: Make the Series.apply method operate Series-wise +# PDEP-13: Deprecate the apply method on Series & DataFrame in favor of series-based operations using the agg and transform methods - Created: 24 August 2023 - Status: Under discussion - Discussion: [#52140](https://github.com/pandas-dev/pandas/issues/52509) - Author: [Terji Petersen](https://github.com/topper-123) -- Revision: 1 +- Revision: 2 ## Abstract -Currently, giving an input to `Series.apply` is treated differently depending on the type of the input: +The `apply`, `transform` and `agg` methods have very complex behavior because they in some cases operate on elements in series, in some cases on series and sometimes try one first, and it that fails, falls back to try the other. There is not a logical system how these behaviors are arranged and it can therefore be difficult for users to understand these methods. -1. if the input is a numpy `ufunc`, `series.apply(func)` is equivalent to `func(series)`, i.e. similar to `series.pipe(func)`. -2. if the input is a callable, but not a numpy `ufunc`, `series.apply(func)` is similar to `Series([func(val) for val in series], index=series.index)`, i.e. similar to `series.map(func)` -3. if the input is a list-like or dict-like, `series.apply(func_list)` is equivalent to `series.agg(func_list)` (which is subtly different than `series.apply`) +I propose to change how `apply`, `transform` and `agg` as follows: -In contrast, `DataFrame.apply` has a consistent behavior: +1. the `agg` & `transform` methods of `Series`, `DataFrame` & `groupby` will always operate series-wise and never element-wise +2. `Series.apply` & `DataFrame.apply` will be deprecated. +3. `groupby.apply` will not be deprecated (because it behaves differently than `Series.apply` & `DataFrame.apply`) -1. if the input is a callable, `df.apply(func)` always calls each columns in the DataFrame, so is similar to `func(col) for _, col in -df.items()` + wrapping functionality -2. if the input is a list-like or dict-like, `df.apply` call each item in the list/dict and wraps the result as needed. So for example if the input is a list, `df.apply(func_list)` is equivalent to `[df.apply(func) for func in func_list]` + wrapping functionality +The above changes means that the future behavior, when users want to apply arbitrary callables in pandas, can be described as follows: -(it can be noted that `Series.apply` and `DataFrame.apply` already treat string input equivalently, so this proposal will not change how `Series.apply` treats string input. For background information, it can also be noted that `df.apply(..., axis=1)` will iterate over each row of frame dataframe, which is the expected behavior). +1. When users want to operate on single elements in a `Series` or `DataFrame`, they should use `Series.map` and `DataFrame.map` respectively. +2. When users want to aggregate a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.agg`, `DataFrame.agg` and `groupby.agg` respectively. +3. When users want to transform a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.transform`, `DataFrame.transform` and `groupby.transform` respectively. -This PDEP proposes that: - -- The current behavior of `Series.apply` described above will be deprecated in Pandas 2.2. -- Single callables given to the `.apply` methods of `Series` will in Pandas 3.0 always be called on the whole `Series`, so `series.apply(func)` will become similar to `func(series)`, -- Lists or dicts given to the `Series.apply` will in Pandas 3.0 always applied using `Series.apply` on each element of the list/dict, instead of being equivalent to calling `Series.agg` on it, i.e. `series.apply(func_list)` will be equivalent to `[series.apply(func) for func in func_list]` + wrapping functionality. - -In short, this PDEP proposes changing `Series.apply` to be more similar to how `DataFrame.apply` works on single dataframe columns, i.e. operate on the whole series. If a user wants to map a callable to each element of a Series, they should be directed to use `Series.map` instead of using `Series.apply`. +The use of `Series.apply` & `DataFrame.apply` will after that change in almost all cases be replaced by one of the above methods. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results. ## Motivation -`Series.apply` is currently a very complex method, whose behaviour will differ depending on the nature of its input. - -`Series.apply` & `Series.map` currently often behave very similar, but differently enough for it to be confusing when it's a good idea to use one over the other and especially when `Series.apply` is a bad idea to use. - -Also, calling `Series.apply` currently gives a different result than the per-column result from calling `DataFrame.apply`, which can be confusing for users who expect `Series.apply` to be the `Series` version of `DataFrame.apply`, similar to how `Series.agg` is the `Series` version of `DataFrame.agg`. For example, currently some functions may work fine with `DataFrame.apply`, but may fail, be very slow when given to `Series.apply` or give a different result than the per-column result from `DataFrame.apply`. +The current behavior of `apply`, `agg` & `transform` is very complex and therefore difficult to understand for non-expert users. The difficulty is especially that the methods sometimes apply callables on elements of series/dataframes, sometimes on Series or columns/rows of Dataframes and sometimes try element-wise operation and if that fails, falls back to series-wise operations. -### Similarities and differences between `Series.apply` and `Series.map` +Below is an overview of the current behavior in table form for `agg`, `transform` & `apply` ( The description may not be 100 % accurate because of various special cases in the current implementation, but will give a good understanding of the current behavior). -The similarity between the methods is especially that they both fall back to use `Series._map_values` and there use `algorithms.map_array` or `ExtensionArray.map` as relevant. +### agg -The differences are many, but each one is relative minor: +| | Series | DataFrame | groupby | +|:-----------------------------------|:---------------------------------|:---------------------------------|:----------| +| ufunc or list/dict of ufuncs | series | series | series | +| other callables (non ufunc) | Try elements, fallback to series | series | series | +| list/dict of callables (non-ufunc) | Try elements, fallback to series | Try elements, fallback to series | series | -1. `Series.map` has a `na_action` parameter, which `Series.apply` doesn't -2. `Series.apply` can take advantage of numpy ufuncs, which `Series.map` can't -3. `Series.apply` can take `args` and `**kwargs`, which `Series.map` can't -4. `Series.apply` is more general and can take a string, e.g. `"sum"`, or lists or dicts of inputs which `Series.map` can't. -5. when given a numpy ufunc, the ufunc will be called on the whole Series, when given to `Series.apply` and on each element of the series, if given to `Series.map`. +### transform -In addition, `Series.apply` has some functionality, which `Series.map` does not, but which has already been deprecated: +| | Series | DataFrame | groupby | +|:-----------------------------------|:---------------------------------|:---------------------------------|:----------| +| ufunc or list/dict of ufuncs | series | series | series | +| other callables (non ufunc) | Try elements, fallback to series | series | series | +| list/dict of callables (non-ufunc) | Try elements, fallback to series | Try elements, fallback to series | series | -6. `Series.apply` has a `convert_dtype` parameter, which has been deprecated (deprecated in pandas 2.1, see [GH52257](https://github.com/pandas-dev/pandas/pull/52257)) -7. `Series.apply` will return a Dataframe, if its result is a list of Series (deprecated in pandas 2.1, see [GH52123]()https://github.com/pandas-dev/pandas/pull/52123)). +### apply -### Similarities and differences between `Series.apply` and `DataFrame.apply` +| | Series | DataFrame | groupby | +|:-----------------------------------|:---------|:------------|:----------| +| ufunc or list/dict of ufuncs | series | series | series | +| other callables (non ufunc) | elements | series | series | +| list/dict of callables (non-ufunc) | Try elements, fallback to series | series | series | -`Series.apply` and `DataFrame.apply` are similar when given numpy ufuncs as inputs, but when given non-ufuncs as inputs, `Series.apply` and `DataFrame.apply` will behave differently, because `series.apply(func)` will be similar to `series.map(func)` while `Dataframe.apply(func)` will call the input on each column series and combine the result. +The 3 tables show that: -If given a list-like or dict-like, `Series.apply` will behave similar to `Series.agg`, while `DataFrame.apply` will call each element in the list-like/dict-like on each column and combine the results. +1. when given numpy ufuncs, callables given to `agg`/`transform`/`apply` operate on series data +2. when used on groupby objects, callables given to `agg`/`transform`/`apply` operate on series data +3. else, in some case it will try element-wise operation and fall back to series-wise operations if that fails, in some case will operate on series data and in some cases on element data. -Also `DataFrame.apply` has some parameters (`raw` and `result_type`) which are relevant for a 2D DataFrame, but may not be relevant for `Series.apply`, because `Series` is a 1D structure. +The above differences result on some non-obvious differences in how the same callable given to `agg`/`transform`/`apply` will behave. -## Examples of problems with the current way `Series.apply` works - -The above similarities and many minor differences makes for confusing and too complex rules for when its a good idea to use `Series.apply` over `Series.map` to do operations, and vica versa, and for when a callable will work well with `Series.apply` versus `DataFrame.apply`. Some examples will show some examples below. - -First some setup: +For example, calling `agg` using the same callable will give different results depending on context: ```python ->>> import numpy as np >>> import pandas as pd +>>> df = pd.DataFrame({"A": range(3)}) >>> ->>> small_ser = pd.Series([1, 2, 3]) ->>> large_ser = pd.Series(range(100_000)) -``` - -### 1: string vs numpy funcs in `Series.apply` - -```python ->>> small_ser.apply("sum") -6 ->>> small_ser.apply(np.sum) -0 1 -1 2 -2 3 -dtype: int64 -``` - -It will surprise users that these two give different results. Also, anyone using the second pattern is probably making a mistake. - -Note that giving `np.sum` to `DataFrame.apply` aggregates properly: - -```python ->>> pd.DataFrame(small_ser).apply(np.sum) -0 6 -dtype: int64 -``` - -This PDEP proposes that callables will be applies to the whole `Series`, so we in the future will have: - -```python ->>> small_ser.apply(np.sum) -6 -``` - -### 2 Callables vs. list/dict of callables - -Giving functions and lists/dicts of functions will give different results: - -```python ->>> small_ser.apply(np.sum) -0 1 -1 2 -2 3 -dtype: int64 ->>> small_ser.apply([np.sum]) -sum 6 +>>> df.agg(lambda x: np.sum(x)) # ok +A 3 dtype: int64 +>>> df.agg([lambda x: np.sum(x)]) # not ok + A + +0 0 +1 1 +2 2 +>>> df.A.agg(lambda x: np.sum(x)) # not ok +0 0 +1 1 +2 2 +Name: A, dtype: int64 ``` -Also with non-numpy callables: +It can also have great effect on performance, even when the result is correct. For example: ```python ->>> small_ser.apply(lambda x: x.sum()) -AttributeError: 'int' object has no attribute 'sum' ->>> small_ser.apply([lambda x: x.sum()]) - 6 -dtype: int64 +>>> df = pd.DataFrame({"A": range(1_000_000)}) +>>> %tiemit df.transform(lambda x: x + 1) # fast +1.43 ms ± 3.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) + >>> %timeit df.transform([lambda x: x + 1]) # slow +163 ms ± 754 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) + >>> %timeit df.A.transform(lambda x: x + 1) # slow +162 ms ± 980 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` -In both cases above the difference is that `Series.apply` operates element-wise, if given a callable, but series-wise if given a list/dict of callables. - -This PDEP proposes that callables will be applies to the whole `Series`, so we in the future will have: - -```python ->>> small_ser.apply(lambda x: x.sum()) -6 ->>> small_ser.apply([lambda x: x.sum()]) - 6 -dtype: int64 -``` - -### 3. Functions in `Series.apply` - -The `Series.apply` doc string have examples with using lambdas, but using lambdas in `Series.apply` is often a bad practices because of bad performance: - -```python ->>> %timeit large_ser.apply(lambda x: x + 1) -24.1 ms ± 88.8 µs per loop -``` - -Currently, `Series` does not have a method that makes a callable operate on a series' data. Instead users need to use `Series.pipe` for that operation in order for the operation to be efficient: - -```python ->>> %timeit large_ser.pipe(lambda x: x + 1) -44 µs ± 363 ns per loop -``` - -(The reason for the above performance differences is that apply gets called on each single element, while `pipe` calls `x.__add__(1)`, which operates on the whole array). - -Note also that `.pipe` operates on the `Series` while `apply`currently operates on each element in the data, so there is some differences that may have some consequence in some cases. - -This PDEP proposes that callables will be applies to the whole `Series`, so we in the future `Series.apply` will be as fast as `Series.pipe`. - -### 4. ufuncs in `Series.apply` vs. normal functions - -Performance-wise, ufuncs are fine in `Series.apply`, but non-ufunc functions are not: - -```python ->>> %timeit large_ser.apply(np.sqrt) -71.6 µs ± 1.17 µs per loop ->>> %timeit large_ser.apply(lambda x:np.sqrt(x)) -63.6 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) -``` +The reason for the great performance difference is that `df.transform(func)` operates on series data, which is fast, while `df.transform(func_list)` will attempt elementwise operation first, and if that works (which is does here), will be much slower than series operations. -It is difficult to understand why ufuncs are fast in `apply`, while other callables are slow in `apply` (answer: it's because ufuncs operate on the whole Series, while other callables operate elementwise). +In addition to the above effects of the current implementation of `agg`/`transform` & `apply`, see [#52140](https://github.com/pandas-dev/pandas/issues/52140) for more examples of the unexpected effects of how `apply` is implemented. -This PDEP proposes that callables will be applies to the whole `Series`, so we in the future non-ufunc functions in `Series.apply` will be as fast as ufuncs. +It can also be noted that `Series.apply` & `DataFrame.apply` could almost always be replaced with calls to `agg`, `transform` & `map`, if `agg` & `transform` were to always operate on series data. For some examples, see the table below for alternative methodt to `apply(func)`: -### 5. callables in `Series.apply` is slow, callables in `DataFrame.apply` is fast +| func | Series | DataFrame | +|:--------------------|:-----------|:------------| +| lambda x: str(x) | .map | .map | +| lambda x: x + 1 | .transform | .transform | +| [lambda x: x.sum()] | .agg | .agg | +`` -Above it was shown that using (non-ufunc) callables in `Series.apply` is bad performance-wise. OTOH using them in `DataFrame.apply` is fine: - -```python ->>> %timeit large_ser.apply(lambda x: x + 1) -24.3 ms ± 24 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ->>> %timeit pd.DataFrame(large_ser).apply(lambda x: x + 1) -160 µs +- 1.17 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) -``` - -Having callables being fast to use in the `DataFrame.apply` method, but slow in `Series.apply` is confusing for users. - -This PDEP proposes that callables will be applies to the whole `Series`, so we in the future `Series.apply` will be as fast as `DataFrame.apply` already is. - -### 6. callables in `Series.apply` may fail, while callables in `DataFrame.apply` do not and vica versa - -```python ->>> ser.apply(lambda x: x.sum()) -AttributeError: 'int' object has no attribute 'sum' ->>> pd.DataFrame(ser).apply(lambda x: x.sum()) -0 6 -dtype: int64 -``` - -Having callables fail when used in `Series.apply`, but work in `DataFrame.Apply` or vica versa is confusing for users. - -This PDEP proposes that callables will be applied to the whole `Series`, so callables given to `Series.apply` will work the same as when given to `DataFrame.apply`, so in the future we will have that: - -```python ->>> ser.apply(lambda x: x.sum()) -6 ->>> pd.DataFrame(ser).apply(lambda x: x.sum()) -0 6 -dtype: int64 -``` - -### 7. `Series.apply` vs. `Series.agg` - -The doc string for `Series.agg` says about the method's `func` parameter: "If a function, must ... work when passed ... to Series.apply". But compare these: - -```python ->>> small_ser.agg(np.sum) -6 ->>> small_ser.apply(np.sum) -0 1 -1 2 -2 3 -dtype: int64 -``` - -Users would expect these two to give the same result. - -This PDEP proposes that callables will be applied to the whole `Series`, so in the future we will have: - -```python ->>> small_ser.agg(np.sum) -6 ->>> small_ser.apply(np.sum) -6 -``` - -### 8. dictlikes vs. listlikes in `Series.apply` - -Giving a *list* of transforming arguments to `Series.apply` returns a `DataFrame`: - -```python ->>> small_ser.apply(["sqrt", np.abs]) - sqrt absolute -0 1.000000 1 -1 1.414214 2 -2 1.732051 3 -``` - -But giving a *dict* of transforming arguments returns a `Series` with a `MultiIndex`: - -```python ->>> small_ser.apply({"sqrt" :"sqrt", "abs" : np.abs}) -sqrt 0 1.000000 - 1 1.414214 - 2 1.732051 -abs 0 1.000000 - 1 2.000000 - 2 3.000000 -dtype: float64 -``` - -These two should give same-shaped output for consistency. Using `Series.transform` instead of `Series.apply`, it returns a `DataFrame` in both cases and I think the dictlike example above should return a `DataFrame` similar to the listlike example. - -Minor additional info: listlikes and dictlikes of aggregation arguments do behave the same, so this is only a problem with dictlikes of transforming arguments when using `apply`. - -This PDEP proposes that the result from giving list-likes and dict-likes to `Series.apply` will have the same shape as when given list-likes currently: - -```python ->>> small_ser.apply(["sqrt", np.abs]) - sqrt absolute -0 1.000000 1 -1 1.414214 2 -2 1.732051 3 ->>> small_ser.apply({"sqrt" :"sqrt", "abs" : np.abs}) - sqrt absolute -0 1.000000 1 -1 1.414214 2 -2 1.732051 3 -``` +Because of their flexibility, `Series.apply` & `DataFrame.apply` are considered unnecessarily complex, and it would be better to direct users to use `.map`, `.agg` or `.transform`, as appropriate in the given situation. ## Proposal -With the above in mind, it is proposed that: - -1. When given a callable, `Series.apply` always operate on the series. I.e. let `series.apply(func)` be similar to `func(series)` + the needed additional functionality. -2. When given a list-like or dict-like, `Series.apply` will apply each element of the list-like/dict-like to the series. I.e. `series.apply(func_list)` will be similar to `[series.apply(func) for func in func_list]` + the needed additional functionality -3. The changes made to `Series.apply`will propagate to `Series.agg` and `Series.transform` as needed. +With the above in mind, it is proposed that in the future: -The difference between `Series.apply()` & `Series.map()` will then be that: +1. the `agg` & `transform` methods of `Series`, `DataFrame` will always operate series-wise and never element-wise +2. `Series.apply` & `DataFrame.apply` will be deprecated. +3. `groupby.apply` will not be deprecated (because it behaves differently than `Series.apply` & `DataFrame.apply`) -* `Series.apply()` makes the passed-in callable operate on the series, similarly to how `(DataFrame|SeriesGroupby|DataFrameGroupBy).apply` operate on series. This is very fast and can do almost anything, -* `Series.map()` makes the passed-in callable operate on each series data elements individually. This is very flexible, but can be very slow, so should only be used if `Series.apply` can't do it. +The above changes means that the future behavior, when users want to apply arbitrary callables in pandas, can be described as follows: -so, this API change will help make Pandas `Series.(apply|map)` API clearer without losing functionality and let their functionality be explainable in a simple manner, which would be a win for Pandas. +1. When users want to operate on single elements in a `Series` or `DataFrame`, they should use `Series.map` and `DataFrame.map` respectively. +2. When users want to aggregate a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.agg`, `DataFrame.agg` and `groupby.agg` respectively. +3. When users want to transform a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.transform`, `DataFrame.transform` and `groupby.transform` respectively. -The result from the above change will be that `Series.apply` will operate similar to how `DataFrame.apply` works already per column, similar to how `Series.map` operates similar to how `DataFrame.map` works per column. This will give better coherence between same-named methods on `DataFrames` and `Series`. +The use of `Series.apply` & `DataFrame.apply` will after that change in almost all cases be replaced by `map`, `agg` or `transform`, so will be deprecated. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results. ## Deprecation process -To change the behavior to the current behavior will have to be deprecated. This can be done by adding a `by_row` parameter to `Series.apply`, so when `by_rows=True`, `Series.apply` will be backward compatible, and when `by_rows=False`, `Series.apply` will operate Series-wise. If the parameter is not set a warning will we emitted and the parameter will be set to `True`, i.e. be backward compatible. So we will have in pandas v2.2: +To change the current behavior, it will have to be deprecated. This will be done by in v2.2: -```python ->>> def apply(self, ..., by_row: bool | NoDefault=no_default, ...): - if by_row is no_default: - warn("The by_row parameter will be set to False in the future", - DeprecationWarning, - stacklevel=find_stack_level() - ) - by_row = True - ... -``` - -In pandas v3.0 the signature will change to: - -```python ->>> def apply(self, ..., by_row: NoDefault=no_default, ...): - if by_row is not no_default: - warn("Do not use the by_row parameter, it will be removed in the future", - DeprecationWarning, - stacklevel=find_stack_level() - ) - ... -``` +1. Deprecate `Series.apply` & `DataFrame.apply`. +2. Add a `series_ops_only` with type `bool | lib.NoDefault` parameter to `agg` & `transform` methods of `Series` & `DataFrame`. When `series_ops_only` is set to False, `agg` & `transform` will behave as they do currently. When set to True, `agg` & `transform` will never operate on elements, but always on Series. When set to `no_default`, `agg` & `transform` will behave as `series_ops_only=False`, but will emit a FutureWarning the current behavior will be reoved in the future. -I.e. the `by_row` parameter will be in the signature in v3.0 in order be backward compatible with v2.x, but will have no effect and will emit a warning if set in method calls. +(It can be noted that `groupby.agg`, `groupby.transform` & `groupby.apply` are not proposed changed in this PDEP, because `groupby.agg`, `groupby.transform` already behave as desired and `groupby.apply` behaves differently than `Series.apply` & `DataFrame.apply`) -In Pandas v4.0, the `by_row` parameter will be removed. +In Pandas v3.0: +1. `Series.apply` & `DataFrame.apply` will be removed from the code base (question: or added to `_hidden_attrs`?). +1. The `agg` & `transform` will always operate on series/columns/rows data and the `series_ops_only` parameter will have no effect and be deprecated and removed in v4.0 (it must be kept in v3.x in order to facilitate the switch from v2.x to v3.0). ## PDEP-13 History -- 24 august 2023: Initial version +- 24 august 2023: Initial version (proposed to change `Series.apply` & `DataFrame.apply` to always operate on series/columns/rows) +- 17. september 2023: version 2 (renamed and proposing to deprecate `Series.apply` & `DataFrame.apply` and make `agg`/`transform` always operate on series/columns/rows) From 8130e2664b8f425f6e958b40619b0f2b30a83350 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Sun, 17 Sep 2023 11:33:50 +0100 Subject: [PATCH 08/13] fix title --- web/pandas/pdeps/0013-standardize-apply.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index 231d38ccc1edb..878684db0ab2c 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -1,4 +1,4 @@ -# PDEP-13: Deprecate the apply method on Series & DataFrame in favor of series-based operations using the agg and transform methods +# PDEP-13: Deprecate the apply method on Series & DataFrame and make the agg and transform methods operate on series data - Created: 24 August 2023 - Status: Under discussion From 70d08defc801b156673809efa7543072ea46b9ab Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Sun, 17 Sep 2023 12:37:05 +0100 Subject: [PATCH 09/13] fix wordings --- web/pandas/pdeps/0013-standardize-apply.md | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index 878684db0ab2c..c944870116ad7 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -10,7 +10,7 @@ The `apply`, `transform` and `agg` methods have very complex behavior because they in some cases operate on elements in series, in some cases on series and sometimes try one first, and it that fails, falls back to try the other. There is not a logical system how these behaviors are arranged and it can therefore be difficult for users to understand these methods. -I propose to change how `apply`, `transform` and `agg` as follows: +It is proposed that `apply`, `transform` and `agg` in the future will work as follows: 1. the `agg` & `transform` methods of `Series`, `DataFrame` & `groupby` will always operate series-wise and never element-wise 2. `Series.apply` & `DataFrame.apply` will be deprecated. @@ -22,13 +22,15 @@ The above changes means that the future behavior, when users want to apply arbit 2. When users want to aggregate a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.agg`, `DataFrame.agg` and `groupby.agg` respectively. 3. When users want to transform a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.transform`, `DataFrame.transform` and `groupby.transform` respectively. -The use of `Series.apply` & `DataFrame.apply` will after that change in almost all cases be replaced by one of the above methods. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results. +The use of `Series.apply` & `DataFrame.apply` will after the proposed change in almost all cases be replaced by `map`, `agg` or `transform`. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it is proposed that it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results. ## Motivation The current behavior of `apply`, `agg` & `transform` is very complex and therefore difficult to understand for non-expert users. The difficulty is especially that the methods sometimes apply callables on elements of series/dataframes, sometimes on Series or columns/rows of Dataframes and sometimes try element-wise operation and if that fails, falls back to series-wise operations. -Below is an overview of the current behavior in table form for `agg`, `transform` & `apply` ( The description may not be 100 % accurate because of various special cases in the current implementation, but will give a good understanding of the current behavior). +Below is an overview of the current behavior in table form when giving callables to `agg`, `transform` & `apply`. As an example on how to read the tables, when a non-ufunc callable is given to `Series.agg`, `Series.agg` will first try to apply the callable to each element in the series, and if that fails, will fall back to call the series using the callable. + +(The description may not be 100 % accurate because of various special cases in the current implementation, but will give a good understanding of the current behavior). ### agg @@ -100,7 +102,7 @@ The reason for the great performance difference is that `df.transform(func)` ope In addition to the above effects of the current implementation of `agg`/`transform` & `apply`, see [#52140](https://github.com/pandas-dev/pandas/issues/52140) for more examples of the unexpected effects of how `apply` is implemented. -It can also be noted that `Series.apply` & `DataFrame.apply` could almost always be replaced with calls to `agg`, `transform` & `map`, if `agg` & `transform` were to always operate on series data. For some examples, see the table below for alternative methodt to `apply(func)`: +It can also be noted that `Series.apply` & `DataFrame.apply` could almost always be replaced with calls to `agg`, `transform` & `map`, if `agg` & `transform` were to always operate on series data. For some examples, see the table below for alternatives using `apply(func)`: | func | Series | DataFrame | |:--------------------|:-----------|:------------| @@ -115,7 +117,9 @@ Because of their flexibility, `Series.apply` & `DataFrame.apply` are considered With the above in mind, it is proposed that in the future: -1. the `agg` & `transform` methods of `Series`, `DataFrame` will always operate series-wise and never element-wise +It is proposed that `apply`, `transform` and `agg` in the future will work as follows: + +1. the `agg` & `transform` methods of `Series`, `DataFrame` & `groupby` will always operate series-wise and never element-wise 2. `Series.apply` & `DataFrame.apply` will be deprecated. 3. `groupby.apply` will not be deprecated (because it behaves differently than `Series.apply` & `DataFrame.apply`) @@ -125,7 +129,9 @@ The above changes means that the future behavior, when users want to apply arbit 2. When users want to aggregate a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.agg`, `DataFrame.agg` and `groupby.agg` respectively. 3. When users want to transform a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.transform`, `DataFrame.transform` and `groupby.transform` respectively. -The use of `Series.apply` & `DataFrame.apply` will after that change in almost all cases be replaced by `map`, `agg` or `transform`, so will be deprecated. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results. +The use of `Series.apply` & `DataFrame.apply` will after the proposed change in almost all cases be replaced by `map`, `agg` or `transform`. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it is proposed that it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results. + +It can be noted that `groupby.agg`, `groupby.transform` & `groupby.apply` are not proposed changed in this PDEP, because `groupby.agg`, `groupby.transform` already behave as desired and `groupby.apply` behaves differently than `Series.apply` & `DataFrame.apply`. Likewise, the behavior when given ufuncs will remain unchanged, because the behavior is already as intended in all cases. ## Deprecation process @@ -134,8 +140,6 @@ To change the current behavior, it will have to be deprecated. This will be done 1. Deprecate `Series.apply` & `DataFrame.apply`. 2. Add a `series_ops_only` with type `bool | lib.NoDefault` parameter to `agg` & `transform` methods of `Series` & `DataFrame`. When `series_ops_only` is set to False, `agg` & `transform` will behave as they do currently. When set to True, `agg` & `transform` will never operate on elements, but always on Series. When set to `no_default`, `agg` & `transform` will behave as `series_ops_only=False`, but will emit a FutureWarning the current behavior will be reoved in the future. -(It can be noted that `groupby.agg`, `groupby.transform` & `groupby.apply` are not proposed changed in this PDEP, because `groupby.agg`, `groupby.transform` already behave as desired and `groupby.apply` behaves differently than `Series.apply` & `DataFrame.apply`) - In Pandas v3.0: 1. `Series.apply` & `DataFrame.apply` will be removed from the code base (question: or added to `_hidden_attrs`?). 1. The `agg` & `transform` will always operate on series/columns/rows data and the `series_ops_only` parameter will have no effect and be deprecated and removed in v4.0 (it must be kept in v3.x in order to facilitate the switch from v2.x to v3.0). From 5276b112c10252f1cca58d4c4d33dd0868148518 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Sun, 17 Sep 2023 14:52:04 +0100 Subject: [PATCH 10/13] fix document --- web/pandas/pdeps/0013-standardize-apply.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index c944870116ad7..b8b489e4491db 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -144,7 +144,7 @@ In Pandas v3.0: 1. `Series.apply` & `DataFrame.apply` will be removed from the code base (question: or added to `_hidden_attrs`?). 1. The `agg` & `transform` will always operate on series/columns/rows data and the `series_ops_only` parameter will have no effect and be deprecated and removed in v4.0 (it must be kept in v3.x in order to facilitate the switch from v2.x to v3.0). -## PDEP-13 History +## PDEP History - 24 august 2023: Initial version (proposed to change `Series.apply` & `DataFrame.apply` to always operate on series/columns/rows) - 17. september 2023: version 2 (renamed and proposing to deprecate `Series.apply` & `DataFrame.apply` and make `agg`/`transform` always operate on series/columns/rows) From 9d1ac2f9eed65f3fff587256b5de5dbcf3547f8d Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Sun, 17 Sep 2023 22:50:11 +0100 Subject: [PATCH 11/13] update --- web/pandas/pdeps/0013-standardize-apply.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index b8b489e4491db..620bec8403502 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -109,9 +109,10 @@ It can also be noted that `Series.apply` & `DataFrame.apply` could almost always | lambda x: str(x) | .map | .map | | lambda x: x + 1 | .transform | .transform | | [lambda x: x.sum()] | .agg | .agg | -`` -Because of their flexibility, `Series.apply` & `DataFrame.apply` are considered unnecessarily complex, and it would be better to direct users to use `.map`, `.agg` or `.transform`, as appropriate in the given situation. +So, for example, `ser.apply(lambda x: str(x))` can be replaced with `ser.map(lambda x: str(x))` while `df.apply([lambda x: x.sum()])` can be replaced with `df.agg([lambda x: x.sum()])`. + +Overall, because of their flexibility, `Series.apply` & `DataFrame.apply` are considered unnecessarily complex, and it would be clearer for users to use `.map`, `.agg` or `.transform`, as appropriate in the given situation. ## Proposal From 42eae36980b33cd93e1e596ec07479e779afabf3 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Wed, 20 Sep 2023 08:35:35 +0100 Subject: [PATCH 12/13] update for comments --- web/pandas/pdeps/0013-standardize-apply.md | 59 ++++++++++++---------- 1 file changed, 33 insertions(+), 26 deletions(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index 620bec8403502..36219adb8b1bd 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -1,4 +1,4 @@ -# PDEP-13: Deprecate the apply method on Series & DataFrame and make the agg and transform methods operate on series data +# PDEP-13: Deprecate the apply method on Series and DataFrame and make the agg and transform methods operate on series data - Created: 24 August 2023 - Status: Under discussion @@ -8,27 +8,29 @@ ## Abstract -The `apply`, `transform` and `agg` methods have very complex behavior because they in some cases operate on elements in series, in some cases on series and sometimes try one first, and it that fails, falls back to try the other. There is not a logical system how these behaviors are arranged and it can therefore be difficult for users to understand these methods. +The `apply`, `transform` and `agg` methods have very complex behavior when given callables because they in some cases operate on elements in series, in some cases on series and sometimes try one first, and it that fails, falls back to try the other. There is not a logical system how these behaviors are arranged and it can therefore be difficult for users to understand these methods. It is proposed that `apply`, `transform` and `agg` in the future will work as follows: -1. the `agg` & `transform` methods of `Series`, `DataFrame` & `groupby` will always operate series-wise and never element-wise -2. `Series.apply` & `DataFrame.apply` will be deprecated. -3. `groupby.apply` will not be deprecated (because it behaves differently than `Series.apply` & `DataFrame.apply`) +1. the `agg` and `transform` methods of `Series`, `DataFrame` and `groupby` will always operate series-wise and never element-wise +2. `Series.apply` and `DataFrame.apply` will be deprecated. +3. The current behavior when supplying string to the methods will not be changed. +4. `groupby.apply` will not be deprecated (because it behaves differently than `Series.apply` and `DataFrame.apply`) The above changes means that the future behavior, when users want to apply arbitrary callables in pandas, can be described as follows: 1. When users want to operate on single elements in a `Series` or `DataFrame`, they should use `Series.map` and `DataFrame.map` respectively. 2. When users want to aggregate a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.agg`, `DataFrame.agg` and `groupby.agg` respectively. 3. When users want to transform a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.transform`, `DataFrame.transform` and `groupby.transform` respectively. +4. Functions that are not applicable for `map`, `agg` nor `transform` are considered relatively rare and in the future users should call these functions directly rather than use the `apply` method. -The use of `Series.apply` & `DataFrame.apply` will after the proposed change in almost all cases be replaced by `map`, `agg` or `transform`. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it is proposed that it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results. +The use of `Series.apply` and `DataFrame.apply` will after the proposed change in almost all cases be replaced by `map`, `agg` or `transform`. In the very few cases where `Series.apply` and `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it is proposed that it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results. ## Motivation -The current behavior of `apply`, `agg` & `transform` is very complex and therefore difficult to understand for non-expert users. The difficulty is especially that the methods sometimes apply callables on elements of series/dataframes, sometimes on Series or columns/rows of Dataframes and sometimes try element-wise operation and if that fails, falls back to series-wise operations. +The current behavior of `apply`, `agg` and `transform` is very complex and therefore difficult to understand for non-expert users. The difficulty is especially that the methods sometimes apply callables on elements of series/dataframes, sometimes on Series or columns/rows of Dataframes and sometimes try element-wise operation and if that fails, falls back to series-wise operations. -Below is an overview of the current behavior in table form when giving callables to `agg`, `transform` & `apply`. As an example on how to read the tables, when a non-ufunc callable is given to `Series.agg`, `Series.agg` will first try to apply the callable to each element in the series, and if that fails, will fall back to call the series using the callable. +Below is an overview of the current behavior in table form when giving callables to `agg`, `transform` and `apply`. As an example on how to read the tables, when a non-ufunc callable is given to `Series.agg`, `Series.agg` will first try to apply the callable to each element in the series, and if that fails, will fall back to call the series using the callable. (The description may not be 100 % accurate because of various special cases in the current implementation, but will give a good understanding of the current behavior). @@ -100,9 +102,9 @@ It can also have great effect on performance, even when the result is correct. F The reason for the great performance difference is that `df.transform(func)` operates on series data, which is fast, while `df.transform(func_list)` will attempt elementwise operation first, and if that works (which is does here), will be much slower than series operations. -In addition to the above effects of the current implementation of `agg`/`transform` & `apply`, see [#52140](https://github.com/pandas-dev/pandas/issues/52140) for more examples of the unexpected effects of how `apply` is implemented. +In addition to the above effects of the current implementation of `agg`/`transform` and `apply`, see [#52140](https://github.com/pandas-dev/pandas/issues/52140) for more examples of the unexpected effects of how `apply` is implemented. -It can also be noted that `Series.apply` & `DataFrame.apply` could almost always be replaced with calls to `agg`, `transform` & `map`, if `agg` & `transform` were to always operate on series data. For some examples, see the table below for alternatives using `apply(func)`: +It can also be noted that `Series.apply` and `DataFrame.apply` could almost always be replaced with calls to `agg`, `transform` or `map`, if `agg` and `transform` were to always operate on series data. For some examples, see the table below for alternatives using `apply(func)`: | func | Series | DataFrame | |:--------------------|:-----------|:------------| @@ -112,40 +114,45 @@ It can also be noted that `Series.apply` & `DataFrame.apply` could almost always So, for example, `ser.apply(lambda x: str(x))` can be replaced with `ser.map(lambda x: str(x))` while `df.apply([lambda x: x.sum()])` can be replaced with `df.agg([lambda x: x.sum()])`. -Overall, because of their flexibility, `Series.apply` & `DataFrame.apply` are considered unnecessarily complex, and it would be clearer for users to use `.map`, `.agg` or `.transform`, as appropriate in the given situation. +Overall, because of their flexibility, `Series.apply` and `DataFrame.apply` are considered unnecessarily complex, and it would be clearer for users to use `.map`, `.agg` or `.transform`, as appropriate in the given situation. ## Proposal -With the above in mind, it is proposed that in the future: +With the above in mind, it is proposed that in the future `apply`, `transform` and `agg` will work as follows: -It is proposed that `apply`, `transform` and `agg` in the future will work as follows: - -1. the `agg` & `transform` methods of `Series`, `DataFrame` & `groupby` will always operate series-wise and never element-wise -2. `Series.apply` & `DataFrame.apply` will be deprecated. -3. `groupby.apply` will not be deprecated (because it behaves differently than `Series.apply` & `DataFrame.apply`) +1. the `agg` and `transform` methods of `Series`, `DataFrame` and `groupby` will always operate series-wise and never element-wise +2. `Series.apply` and `DataFrame.apply` will be deprecated. +3. `groupby.apply` will not be deprecated (because it behaves differently than `Series.apply` and `DataFrame.apply`) The above changes means that the future behavior, when users want to apply arbitrary callables in pandas, can be described as follows: 1. When users want to operate on single elements in a `Series` or `DataFrame`, they should use `Series.map` and `DataFrame.map` respectively. 2. When users want to aggregate a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.agg`, `DataFrame.agg` and `groupby.agg` respectively. 3. When users want to transform a `Series`, columns/rows of a `DataFrame` or groups in `groupby` objects, they should use `Series.transform`, `DataFrame.transform` and `groupby.transform` respectively. +4. Functions that are not applicable for `map`, `agg` nor `transform` are considered relatively rare and in the future users should call these functions directly rather than use the `apply` method. -The use of `Series.apply` & `DataFrame.apply` will after the proposed change in almost all cases be replaced by `map`, `agg` or `transform`. In the very few cases where `Series.apply` & `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it is proposed that it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results. +The use of `Series.apply` and `DataFrame.apply` will after the proposed change in almost all cases be replaced by `map`, `agg` or `transform`. In the very few cases where `Series.apply` and `DataFrame.apply` cannot be substituted by `map`, `agg` or `transform`, it is proposed that it will be accepted that users will have to find alternative ways to apply the functions, i.e. typically apply the functions manually and possibly concatenating the results. -It can be noted that `groupby.agg`, `groupby.transform` & `groupby.apply` are not proposed changed in this PDEP, because `groupby.agg`, `groupby.transform` already behave as desired and `groupby.apply` behaves differently than `Series.apply` & `DataFrame.apply`. Likewise, the behavior when given ufuncs will remain unchanged, because the behavior is already as intended in all cases. +It can be noted that the behavior of `groupby.agg`, `groupby.transform` and `groupby.apply` are not proposed changed in this PDEP, because `groupby.agg`, `groupby.transform` already behave as desired and `groupby.apply` behaves differently than `Series.apply` and `DataFrame.apply`. Likewise, the behavior when given ufuncs (e.g. `np.sqrt`) and string input (e.g. `"sqrt"`) will remain unchanged, because the behavior is already as intended in all cases. ## Deprecation process -To change the current behavior, it will have to be deprecated. This will be done by in v2.2: +To change the current behavior, it will have to be deprecated. However, `Series.apply` and `DataFrame.apply` are very widely used methods, so will be deprecated very gradually: -1. Deprecate `Series.apply` & `DataFrame.apply`. -2. Add a `series_ops_only` with type `bool | lib.NoDefault` parameter to `agg` & `transform` methods of `Series` & `DataFrame`. When `series_ops_only` is set to False, `agg` & `transform` will behave as they do currently. When set to True, `agg` & `transform` will never operate on elements, but always on Series. When set to `no_default`, `agg` & `transform` will behave as `series_ops_only=False`, but will emit a FutureWarning the current behavior will be reoved in the future. +This means that in v2.2: + +1. Calls to `Series.apply` and `DataFrame.apply`will emit a `DeprecationWarning` with an appropriate deprecation message. +2. A `series_ops_only` with type `bool | lib.NoDefault` parameter will be added to the `agg` and `transform` methods of `Series` and `DataFrame`. When `series_ops_only` is set to False, `agg` and `transform` will behave as they do currently. When set to True, `agg` and `transform` will never operate on elements, but always on Series. When set to `no_default`, `agg` and `transform` will behave as `series_ops_only=False`, but will emit a FutureWarning the current behavior will be reoved in the future. In Pandas v3.0: -1. `Series.apply` & `DataFrame.apply` will be removed from the code base (question: or added to `_hidden_attrs`?). -1. The `agg` & `transform` will always operate on series/columns/rows data and the `series_ops_only` parameter will have no effect and be deprecated and removed in v4.0 (it must be kept in v3.x in order to facilitate the switch from v2.x to v3.0). +1. Calls to `Series.apply` and `DataFrame.apply` will emit a `FutureWarning` and emit an appropriate deprecation message. +2. The `agg` and `transform` will always operate on series/columns/rows data and the `series_ops_only` parameter will have no effect and be deprecated. + +In Pandas v4.0: +1. `Series.apply` and `DataFrame.apply` will be removed from the code base. +2. The `series_ops_only` parameter of agg` and `transform` will be removed from the code base. ## PDEP History -- 24 august 2023: Initial version (proposed to change `Series.apply` & `DataFrame.apply` to always operate on series/columns/rows) -- 17. september 2023: version 2 (renamed and proposing to deprecate `Series.apply` & `DataFrame.apply` and make `agg`/`transform` always operate on series/columns/rows) +- 24 august 2023: Initial version (proposed to change `Series.apply` and `DataFrame.apply` to always operate on series/columns/rows) +- 17. september 2023: version 2 (renamed and proposing to deprecate `Series.apply` and `DataFrame.apply` and make `agg`/`transform` always operate on series/columns/rows) From 138e4c62d59565eabeef244b625b96304bd2afd4 Mon Sep 17 00:00:00 2001 From: Terji Petersen Date: Wed, 20 Sep 2023 08:40:05 +0100 Subject: [PATCH 13/13] small fix --- web/pandas/pdeps/0013-standardize-apply.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/pandas/pdeps/0013-standardize-apply.md b/web/pandas/pdeps/0013-standardize-apply.md index 36219adb8b1bd..3975b22434a06 100644 --- a/web/pandas/pdeps/0013-standardize-apply.md +++ b/web/pandas/pdeps/0013-standardize-apply.md @@ -142,7 +142,7 @@ To change the current behavior, it will have to be deprecated. However, `Series. This means that in v2.2: 1. Calls to `Series.apply` and `DataFrame.apply`will emit a `DeprecationWarning` with an appropriate deprecation message. -2. A `series_ops_only` with type `bool | lib.NoDefault` parameter will be added to the `agg` and `transform` methods of `Series` and `DataFrame`. When `series_ops_only` is set to False, `agg` and `transform` will behave as they do currently. When set to True, `agg` and `transform` will never operate on elements, but always on Series. When set to `no_default`, `agg` and `transform` will behave as `series_ops_only=False`, but will emit a FutureWarning the current behavior will be reoved in the future. +2. A `series_ops_only` argument with type `bool | lib.NoDefault` parameter will be added to the `agg` and `transform` methods of `Series` and `DataFrame` with a default value of `lib.NoDefault`. When `series_ops_only` is set to `False`, `agg` and `transform` will behave as they do currently. When set to `True`, `agg` and `transform` will never operate on elements, but always on Series. When set to `no_default`, `agg` and `transform` will behave as `series_ops_only=False`, but will emit a `DeprecationWarning`, the current behavior will be removed in the future. In Pandas v3.0: 1. Calls to `Series.apply` and `DataFrame.apply` will emit a `FutureWarning` and emit an appropriate deprecation message.