Skip to content

Commit 146ee99

Browse files
committed
DOC: bigger badder warnings on chained assignment
1 parent 9fc8636 commit 146ee99

File tree

1 file changed

+58
-5
lines changed

1 file changed

+58
-5
lines changed

doc/source/indexing.rst

+58-5
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,10 @@ indexing.
4343
standard operators has some optimization limits. For production code, we recommended
4444
that you take advantage of the optimized pandas data access methods exposed in this chapter.
4545

46-
In addition, whether a copy or a reference is returned for a selection operation, may depend on the context.
46+
.. warning::
47+
48+
Whether a copy or a reference is returned for a setting operation, may depend on the context.
49+
This is sometimes called ``chained assignment`` and should be avoided.
4750
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`
4851

4952
See the :ref:`cookbook<cookbook.selection>` for some advanced strategies
@@ -219,6 +222,12 @@ largely as a convenience since it is such a common operation.
219222
Selection By Label
220223
~~~~~~~~~~~~~~~~~~
221224

225+
.. warning::
226+
227+
Whether a copy or a reference is returned for a setting operation, may depend on the context.
228+
This is sometimes called ``chained assignment`` and should be avoided.
229+
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`
230+
222231
Pandas provides a suite of methods in order to have **purely label based indexing**. This is a strict inclusion based protocol.
223232
**ALL** of the labels for which you ask, must be in the index or a ``KeyError`` will be raised! When slicing, the start bound is *included*, **AND** the stop bound is *included*. Integers are valid labels, but they refer to the label **and not the position**.
224233

@@ -284,6 +293,12 @@ For getting a value explicity (equiv to deprecated ``df.get_value('a','A')``)
284293
Selection By Position
285294
~~~~~~~~~~~~~~~~~~~~~
286295

296+
.. warning::
297+
298+
Whether a copy or a reference is returned for a setting operation, may depend on the context.
299+
This is sometimes called ``chained assignment`` and should be avoided.
300+
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`
301+
287302
Pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely python and numpy slicing. These are ``0-based`` indexing. When slicing, the start bounds is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise a ``IndexError``.
288303

289304
The ``.iloc`` attribute is the primary access method. The following are valid inputs:
@@ -822,11 +837,49 @@ In chained expressions, the order may determine whether a copy is returned or no
822837
.. ipython:: python
823838
824839
825-
dfb = DataFrame({'a' : ['one', 'one', 'two', 'three', 'two', 'one', 'six'],
826-
'b' : ['x', 'y', 'y', 'x', 'y', 'x', 'x'],
840+
dfb = DataFrame({'a' : ['one', 'one', 'two',
841+
'three', 'two', 'one', 'six'],
842+
'b' : ['x', 'y', 'y',
843+
'x', 'y', 'x', 'x'],
827844
'c' : randn(7)})
828-
dfb[dfb.a.str.startswith('o')]['c'] = 42 # goes to copy (will be lost)
829-
dfb['c'][dfb.a.str.startswith('o')] = 42 # passed via reference (will stay)
845+
846+
847+
# goes to copy (will be lost)
848+
dfb[dfb.a.str.startswith('o')]['c'] = 42
849+
850+
# passed via reference (will stay)
851+
dfb['c'][dfb.a.str.startswith('o')] = 42
852+
853+
A chained assignment can also crop up in setting in a mixed dtype frame.
854+
855+
.. note::
856+
857+
These setting rules apply to all of ``.loc/.iloc/.ix``
858+
859+
This is the correct access method
860+
861+
.. ipython:: python
862+
863+
dfc = DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})
864+
dfc_copy = dfc.copy()
865+
dfc_copy.loc[0,'A'] = 11
866+
dfc_copy
867+
868+
This *can* work at times, but is not guaranteed, and so should be avoided
869+
870+
.. ipython:: python
871+
872+
dfc_copy = dfc.copy()
873+
dfc_copy['A'][0] = 111
874+
dfc_copy
875+
876+
This will **not** work at all, and so should be avoided
877+
878+
.. ipython:: python
879+
880+
dfc_copy = dfc.copy()
881+
dfc_copy.loc[0]['A'] = 1111
882+
dfc_copy
830883
831884
When assigning values to subsets of your data, thus, make sure to either use the
832885
pandas access methods or explicitly handle the assignment creating a copy.

0 commit comments

Comments
 (0)