You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After playing with R a bunch, I started feeling like the explanation of
`SettingWithCopy` wasn't getting to the core of the matter, which is actually
an essential consequence of python slice assignment semantics. Here's how
python handles chained assignment:
```python
df['foo']['bar'] = quux
df.__getitem__('foo').__setitem__('bar', quux)
```
whereas in R, it's this:
```R
df["foo"]["bar"] <- quux
df["foo"] <- `[<-`(df["foo"], "bar", quux)
df <- `[<-`(df, "foo", `[<-`(`[`(df, "foo"), "bar", quux))
```
That last is a lot of line noise, though the R method names `` `[` `` and
`` `[<-` `` are more concise than `__getitem__` and `__setitem__`! But imagine
that you could call `__setitem__` with a kwarg `inplace=False` that would cause
it to return a modified copy instead of modifying the original object. Then the
R version would translate to this in python:
```python
df = df.__setitem__('foo',
df.__getitem__('foo')
.__setitem__('bar', quux, inplace=False),
inplace=False)
```
This is incredibly awkward, but it has the advantage of making
`SettingWithCopy` unnecessary— *everything* is a copy, and yet things get
set nonetheless.
So this commit is an attempt to explain this without requiring the reader to
know R.
Since the chained indexing is 2 calls, it is possible that either call may return a **copy** of the data because of the way it is sliced.
1535
-
Thus when setting, you are actually setting a **copy**, and not the original frame data. It is impossible for pandas to figure this out because their are 2 separate python operations that are not connected.
1542
+
But this code is handled differently:
1543
+
1544
+
.. code-block:: python
1536
1545
1537
-
The ``SettingWithCopy`` warning is a 'heuristic' to detect this (meaning it tends to catch most cases but is simply a lightweight check). Figuring this out for real is way complicated.
See that ``__getitem__`` in there? Outside of simple cases, it's very hard to
1551
+
predict whether it will return a view or a copy (it depends on the memory layout
1552
+
of the array, about which *pandas* makes no guarantees), and therefore whether
1553
+
the ``__setitem__`` will modify ``dfmi`` or a temporary object that gets thrown
1554
+
out immediately afterward. **That's** what ``SettingWithCopy`` is warning you
1555
+
about!
1556
+
1557
+
.. note:: You may be wondering whether we should be concerned about the ``loc``
1558
+
property in the first example. But ``dfmi.loc`` is guaranteed to be ``dfmi``
1559
+
itself with modified indexing behavior, so ``dfmi.loc.__getitem__`` /
1560
+
``dfmi.loc.__setitem__`` operate on ``dfmi`` directly. Of course,
1561
+
``dfmi.loc.__getitem__(idx)`` may be a view or a copy of ``dfmi``.
1562
+
1563
+
Sometimes a ``SettingWithCopy`` warning will arise at times when there's no
1564
+
obvious chained indexing going on. **These** are the bugs that
1565
+
``SettingWithCopy`` is designed to catch! Pandas is probably trying to warn you
1566
+
that you've done this:
1567
+
1568
+
.. code-block:: python
1538
1569
1539
-
The ``.loc`` operation is a single python operation, and thus can select a slice (which still may be a copy), but allows pandas to assign that slice back into the frame after it is modified, thus setting the values as you would think.
1570
+
defdo_something(df):
1571
+
foo = df[['bar', 'baz']] # Is foo a view? A copy? Nobody knows!
1572
+
# ... many lines here ...
1573
+
foo['quux'] = value # We don't know whether this will modify df or not!
1574
+
return foo
1540
1575
1541
-
The reason for having the ``SettingWithCopy`` warning is this. Sometimes when you slice an array you will simply get a view back, which means you can set it no problem. However, even a single dtyped array can generate a copy if it is sliced in a particular way. A multi-dtyped DataFrame (meaning it has say ``float`` and ``object`` data), will almost always yield a copy. Whether a view is created is dependent on the memory layout of the array.
0 commit comments