Skip to content

Commit 0e287e5

Browse files
committed
DOC: updates to indexing.rst / chained indexing docs
1 parent 4663353 commit 0e287e5

File tree

1 file changed

+56
-6
lines changed

1 file changed

+56
-6
lines changed

doc/source/indexing.rst

+56-6
Original file line numberDiff line numberDiff line change
@@ -1365,13 +1365,63 @@ Of course if you need integer based selection, then use ``iloc``
13651365
Returning a view versus a copy
13661366
------------------------------
13671367
1368-
The rules about when a view on the data is returned are entirely dependent on
1369-
NumPy. Whenever an array of labels or a boolean vector are involved in the
1370-
indexing operation, the result will be a copy. With single label / scalar
1371-
indexing and slicing, e.g. ``df.ix[3:6]`` or ``df.ix[:, 'A']``, a view will be
1372-
returned.
1368+
When setting values in a pandas object, care must be taken to avoid what is called
1369+
``chained indexing``. Here is an example.
13731370
1374-
In chained expressions, the order may determine whether a copy is returned or not.
1371+
.. ipython:: python
1372+
1373+
dfmi = DataFrame([list('abcd'),
1374+
list('efgh'),
1375+
list('ijkl'),
1376+
list('mnop')],
1377+
columns=MultiIndex.from_product([['one','two'],
1378+
['first','second']]))
1379+
dfmi
1380+
1381+
Compare these two access methods:
1382+
1383+
.. ipython:: python
1384+
1385+
dfmi['one']['second']
1386+
1387+
.. ipython:: python
1388+
1389+
dfmi.loc[:,('one','second')]
1390+
1391+
These both yield the same results, so which should you use? It is instructive to understand the order
1392+
of operations on these and why method 2 (``.loc``) is much preferred over method 1 (chained ``[]``)
1393+
1394+
``dfmi['one']`` selects the first level of the columns and returns a data frame that is singly-indexed.
1395+
Then another python operation ``dfmi_with_one['second']`` selects the series indexed by ``'second'`` happens.
1396+
This is indicated by the variable ``dfmi_with_one`` because pandas sees these operations as separate events.
1397+
e.g. separate calls to ``__getitem__``, so it has to treat them as linear operations, they happen one after another.
1398+
1399+
Contrast this to ``df.loc[:,('one','second')]`` which passes a nested tuple of ``(slice(None),('one','second'))`` to a single call to
1400+
``__getitem__``. This allows pandas to deal with this as a single entity. Furthermore this order of operations *can* be significantly
1401+
faster, and allows one to index *both* axes if so desired.
1402+
1403+
Why does the assignment when using chained indexing fail!
1404+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1405+
1406+
So, why does this show the ``SettingWithCopy`` warning / and possibly not work when you do chained indexing and assignement:
1407+
1408+
.. code-block:: python
1409+
1410+
dfmi['one']['second'] = value
1411+
1412+
Since the chained indexing is 2 calls, it is possible that either call may return a **copy** of the data because of the way it is sliced.
1413+
Thus when setting, you are actually setting a **copy**, and not the original frame data. It is impossible for pandas to figure this out because their are 2 separate python operations that are not connected.
1414+
1415+
The ``SettingWithCopy`` warning is a 'heuristic' to detect this (meaning it tends to catch most cases but is simply a lightweight check). Figuring this out for real is way complicated.
1416+
1417+
The ``.loc`` operation is a single python operation, and thus can select a slice (which still may be a copy), but allows pandas to assign that slice back into the frame after it is modified, thus setting the values as you would think.
1418+
1419+
The reason for having the ``SettingWithCopy`` warning is this. Sometimes when you slice an array you will simply get a view back, which means you can set it no problem. However, even a single dtyped array can generate a copy if it is sliced in a particular way. A multi-dtyped DataFrame (meaning it has say ``float`` and ``object`` data), will almost always yield a copy. Whether a view is created is dependent on the memory layout of the array.
1420+
1421+
Evaluation order matters
1422+
~~~~~~~~~~~~~~~~~~~~~~~~
1423+
1424+
Furthermore, in chained expressions, the order may determine whether a copy is returned or not.
13751425
If an expression will set values on a copy of a slice, then a ``SettingWithCopy``
13761426
exception will be raised (this raise/warn behavior is new starting in 0.13.0)
13771427

0 commit comments

Comments
 (0)