Skip to content

Commit f5fbec8

Browse files
committed
Merge pull request #11746 from ischwabacher/patch-3
Improve(?) explanation of SettingWithCopy warning
2 parents 8ec8487 + 9f1af70 commit f5fbec8

File tree

1 file changed

+44
-9
lines changed

1 file changed

+44
-9
lines changed

Diff for: doc/source/indexing.rst

+44-9
Original file line numberDiff line numberDiff line change
@@ -1522,23 +1522,58 @@ Contrast this to ``df.loc[:,('one','second')]`` which passes a nested tuple of `
15221522
``__getitem__``. This allows pandas to deal with this as a single entity. Furthermore this order of operations *can* be significantly
15231523
faster, and allows one to index *both* axes if so desired.
15241524

1525-
Why does the assignment when using chained indexing fail!
1526-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1525+
Why does assignment fail when using chained indexing?
1526+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15271527

1528-
So, why does this show the ``SettingWithCopy`` warning / and possibly not work when you do chained indexing and assignment:
1528+
The problem in the previous section is just a performance issue. What's up with
1529+
the ``SettingWithCopy`` warning? We don't **usually** throw warnings around when
1530+
you do something that might cost a few extra milliseconds!
1531+
1532+
But it turns out that assigning to the product of chained indexing has
1533+
inherently unpredictable results. To see this, think about how the Python
1534+
interpreter executes this code:
15291535

15301536
.. code-block:: python
15311537
1532-
dfmi['one']['second'] = value
1538+
dfmi.loc[:,('one','second')] = value
1539+
# becomes
1540+
dfmi.loc.__setitem__((slice(None), ('one', 'second')), value)
15331541
1534-
Since the chained indexing is 2 calls, it is possible that either call may return a **copy** of the data because of the way it is sliced.
1535-
Thus when setting, you are actually setting a **copy**, and not the original frame data. It is impossible for pandas to figure this out because their are 2 separate python operations that are not connected.
1542+
But this code is handled differently:
1543+
1544+
.. code-block:: python
15361545
1537-
The ``SettingWithCopy`` warning is a 'heuristic' to detect this (meaning it tends to catch most cases but is simply a lightweight check). Figuring this out for real is way complicated.
1546+
dfmi['one']['second'] = value
1547+
# becomes
1548+
dfmi.__getitem__('one').__setitem__('second', value)
1549+
1550+
See that ``__getitem__`` in there? Outside of simple cases, it's very hard to
1551+
predict whether it will return a view or a copy (it depends on the memory layout
1552+
of the array, about which *pandas* makes no guarantees), and therefore whether
1553+
the ``__setitem__`` will modify ``dfmi`` or a temporary object that gets thrown
1554+
out immediately afterward. **That's** what ``SettingWithCopy`` is warning you
1555+
about!
1556+
1557+
.. note:: You may be wondering whether we should be concerned about the ``loc``
1558+
property in the first example. But ``dfmi.loc`` is guaranteed to be ``dfmi``
1559+
itself with modified indexing behavior, so ``dfmi.loc.__getitem__`` /
1560+
``dfmi.loc.__setitem__`` operate on ``dfmi`` directly. Of course,
1561+
``dfmi.loc.__getitem__(idx)`` may be a view or a copy of ``dfmi``.
1562+
1563+
Sometimes a ``SettingWithCopy`` warning will arise at times when there's no
1564+
obvious chained indexing going on. **These** are the bugs that
1565+
``SettingWithCopy`` is designed to catch! Pandas is probably trying to warn you
1566+
that you've done this:
1567+
1568+
.. code-block:: python
15381569
1539-
The ``.loc`` operation is a single python operation, and thus can select a slice (which still may be a copy), but allows pandas to assign that slice back into the frame after it is modified, thus setting the values as you would think.
1570+
def do_something(df):
1571+
foo = df[['bar', 'baz']] # Is foo a view? A copy? Nobody knows!
1572+
# ... many lines here ...
1573+
foo['quux'] = value # We don't know whether this will modify df or not!
1574+
return foo
15401575
1541-
The reason for having the ``SettingWithCopy`` warning is this. Sometimes when you slice an array you will simply get a view back, which means you can set it no problem. However, even a single dtyped array can generate a copy if it is sliced in a particular way. A multi-dtyped DataFrame (meaning it has say ``float`` and ``object`` data), will almost always yield a copy. Whether a view is created is dependent on the memory layout of the array.
1576+
Yikes!
15421577

15431578
Evaluation order matters
15441579
~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)