Skip to content

Commit f4de260

Browse files
rhshadrachznicholls
authored andcommitted
DOC: Ban mutation in UDF methods (pandas-dev#39762)
1 parent 5cbafe4 commit f4de260

File tree

6 files changed

+106
-1
lines changed

6 files changed

+106
-1
lines changed

doc/source/user_guide/gotchas.rst

+69
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,75 @@ To test for membership in the values, use the method :meth:`~pandas.Series.isin`
178178
For ``DataFrames``, likewise, ``in`` applies to the column axis,
179179
testing for membership in the list of column names.
180180

181+
.. _udf-mutation:
182+
183+
Mutating with User Defined Function (UDF) methods
184+
-------------------------------------------------
185+
186+
It is a general rule in programming that one should not mutate a container
187+
while it is being iterated over. Mutation will invalidate the iterator,
188+
causing unexpected behavior. Consider the example:
189+
190+
.. ipython:: python
191+
192+
values = [0, 1, 2, 3, 4, 5]
193+
n_removed = 0
194+
for k, value in enumerate(values):
195+
idx = k - n_removed
196+
if value % 2 == 1:
197+
del values[idx]
198+
n_removed += 1
199+
else:
200+
values[idx] = value + 1
201+
values
202+
203+
One probably would have expected that the result would be ``[1, 3, 5]``.
204+
When using a pandas method that takes a UDF, internally pandas is often
205+
iterating over the
206+
``DataFrame`` or other pandas object. Therefore, if the UDF mutates (changes)
207+
the ``DataFrame``, unexpected behavior can arise.
208+
209+
Here is a similar example with :meth:`DataFrame.apply`:
210+
211+
.. ipython:: python
212+
213+
def f(s):
214+
s.pop("a")
215+
return s
216+
217+
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
218+
try:
219+
df.apply(f, axis="columns")
220+
except Exception as err:
221+
print(repr(err))
222+
223+
To resolve this issue, one can make a copy so that the mutation does
224+
not apply to the container being iterated over.
225+
226+
.. ipython:: python
227+
228+
values = [0, 1, 2, 3, 4, 5]
229+
n_removed = 0
230+
for k, value in enumerate(values.copy()):
231+
idx = k - n_removed
232+
if value % 2 == 1:
233+
del values[idx]
234+
n_removed += 1
235+
else:
236+
values[idx] = value + 1
237+
values
238+
239+
.. ipython:: python
240+
241+
def f(s):
242+
s = s.copy()
243+
s.pop("a")
244+
return s
245+
246+
df = pd.DataFrame({"a": [1, 2, 3], 'b': [4, 5, 6]})
247+
df.apply(f, axis="columns")
248+
249+
181250
``NaN``, Integer ``NA`` values and ``NA`` type promotions
182251
---------------------------------------------------------
183252

pandas/core/frame.py

+6
Original file line numberDiff line numberDiff line change
@@ -7814,6 +7814,12 @@ def apply(
78147814
DataFrame.aggregate: Only perform aggregating type operations.
78157815
DataFrame.transform: Only perform transforming type operations.
78167816
7817+
Notes
7818+
-----
7819+
Functions that mutate the passed object can produce unexpected
7820+
behavior or errors and are not supported. See :ref:`udf-mutation`
7821+
for more details.
7822+
78177823
Examples
78187824
--------
78197825
>>> df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])

pandas/core/groupby/generic.py

+10
Original file line numberDiff line numberDiff line change
@@ -580,6 +580,12 @@ def filter(self, func, dropna=True, *args, **kwargs):
580580
dropna : Drop groups that do not pass the filter. True by default;
581581
if False, groups that evaluate False are filled with NaNs.
582582
583+
Notes
584+
-----
585+
Functions that mutate the passed object can produce unexpected
586+
behavior or errors and are not supported. See :ref:`udf-mutation`
587+
for more details.
588+
583589
Examples
584590
--------
585591
>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
@@ -1506,6 +1512,10 @@ def filter(self, func, dropna=True, *args, **kwargs):
15061512
Each subframe is endowed the attribute 'name' in case you need to know
15071513
which group you are working on.
15081514
1515+
Functions that mutate the passed object can produce unexpected
1516+
behavior or errors and are not supported. See :ref:`udf-mutation`
1517+
for more details.
1518+
15091519
Examples
15101520
--------
15111521
>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',

pandas/core/groupby/groupby.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -344,7 +344,7 @@ class providing the base-class of operations.
344344
in the subframe. If f also supports application to the entire subframe,
345345
then a fast path is used starting from the second chunk.
346346
* f must not mutate groups. Mutation is not supported and may
347-
produce unexpected results.
347+
produce unexpected results. See :ref:`udf-mutation` for more details.
348348
349349
When using ``engine='numba'``, there will be no "fall back" behavior internally.
350350
The group data and group index will be passed as numpy arrays to the JITed
@@ -447,6 +447,10 @@ class providing the base-class of operations.
447447
The group data and group index will be passed as numpy arrays to the JITed
448448
user defined function, and no alternative execution attempts will be tried.
449449
{examples}
450+
451+
Functions that mutate the passed object can produce unexpected
452+
behavior or errors and are not supported. See :ref:`udf-mutation`
453+
for more details.
450454
"""
451455

452456

pandas/core/series.py

+6
Original file line numberDiff line numberDiff line change
@@ -4044,6 +4044,12 @@ def apply(
40444044
Series.agg: Only perform aggregating type operations.
40454045
Series.transform: Only perform transforming type operations.
40464046
4047+
Notes
4048+
-----
4049+
Functions that mutate the passed object can produce unexpected
4050+
behavior or errors and are not supported. See :ref:`udf-mutation`
4051+
for more details.
4052+
40474053
Examples
40484054
--------
40494055
Create a series with typical summer temperatures for each city.

pandas/core/shared_docs.py

+10
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,10 @@
4141
-----
4242
`agg` is an alias for `aggregate`. Use the alias.
4343
44+
Functions that mutate the passed object can produce unexpected
45+
behavior or errors and are not supported. See :ref:`udf-mutation`
46+
for more details.
47+
4448
A passed user-defined-function will be passed a Series for evaluation.
4549
{examples}"""
4650

@@ -296,6 +300,12 @@
296300
{klass}.agg : Only perform aggregating type operations.
297301
{klass}.apply : Invoke function on a {klass}.
298302
303+
Notes
304+
-----
305+
Functions that mutate the passed object can produce unexpected
306+
behavior or errors and are not supported. See :ref:`udf-mutation`
307+
for more details.
308+
299309
Examples
300310
--------
301311
>>> df = pd.DataFrame({{'A': range(3), 'B': range(1, 4)}})

0 commit comments

Comments
 (0)