@@ -178,6 +178,75 @@ To test for membership in the values, use the method :meth:`~pandas.Series.isin`
178
178
For ``DataFrames ``, likewise, ``in `` applies to the column axis,
179
179
testing for membership in the list of column names.
180
180
181
+ .. _udf-mutation :
182
+
183
+ Mutating with User Defined Function (UDF) methods
184
+ -------------------------------------------------
185
+
186
+ It is a general rule in programming that one should not mutate a container
187
+ while it is being iterated over. Mutation will invalidate the iterator,
188
+ causing unexpected behavior. Consider the example:
189
+
190
+ .. ipython :: python
191
+
192
+ values = [0 , 1 , 2 , 3 , 4 , 5 ]
193
+ n_removed = 0
194
+ for k, value in enumerate (values):
195
+ idx = k - n_removed
196
+ if value % 2 == 1 :
197
+ del values[idx]
198
+ n_removed += 1
199
+ else :
200
+ values[idx] = value + 1
201
+ values
202
+
203
+ One probably would have expected that the result would be ``[1, 3, 5] ``.
204
+ When using a pandas method that takes a UDF, internally pandas is often
205
+ iterating over the
206
+ ``DataFrame `` or other pandas object. Therefore, if the UDF mutates (changes)
207
+ the ``DataFrame ``, unexpected behavior can arise.
208
+
209
+ Here is a similar example with :meth: `DataFrame.apply `:
210
+
211
+ .. ipython :: python
212
+
213
+ def f (s ):
214
+ s.pop(" a" )
215
+ return s
216
+
217
+ df = pd.DataFrame({" a" : [1 , 2 , 3 ], " b" : [4 , 5 , 6 ]})
218
+ try :
219
+ df.apply(f, axis = " columns" )
220
+ except Exception as err:
221
+ print (repr (err))
222
+
223
+ To resolve this issue, one can make a copy so that the mutation does
224
+ not apply to the container being iterated over.
225
+
226
+ .. ipython :: python
227
+
228
+ values = [0 , 1 , 2 , 3 , 4 , 5 ]
229
+ n_removed = 0
230
+ for k, value in enumerate (values.copy()):
231
+ idx = k - n_removed
232
+ if value % 2 == 1 :
233
+ del values[idx]
234
+ n_removed += 1
235
+ else :
236
+ values[idx] = value + 1
237
+ values
238
+
239
+ .. ipython :: python
240
+
241
+ def f (s ):
242
+ s = s.copy()
243
+ s.pop(" a" )
244
+ return s
245
+
246
+ df = pd.DataFrame({" a" : [1 , 2 , 3 ], ' b' : [4 , 5 , 6 ]})
247
+ df.apply(f, axis = " columns" )
248
+
249
+
181
250
``NaN ``, Integer ``NA `` values and ``NA `` type promotions
182
251
---------------------------------------------------------
183
252
0 commit comments