@@ -29,13 +29,6 @@ this area.
29
29
production code, we recommended that you take advantage of the optimized
30
30
pandas data access methods exposed in this chapter.
31
31
32
- .. warning ::
33
-
34
- Whether a copy or a reference is returned for a setting operation, may
35
- depend on the context. This is sometimes called ``chained assignment `` and
36
- should be avoided. See :ref: `Returning a View versus Copy
37
- <indexing.view_versus_copy>`.
38
-
39
32
See the :ref: `MultiIndex / Advanced Indexing <advanced >` for ``MultiIndex `` and more advanced indexing documentation.
40
33
41
34
See the :ref: `cookbook<cookbook.selection> ` for some advanced strategies.
@@ -299,12 +292,6 @@ largely as a convenience since it is such a common operation.
299
292
Selection by label
300
293
------------------
301
294
302
- .. warning ::
303
-
304
- Whether a copy or a reference is returned for a setting operation, may depend on the context.
305
- This is sometimes called ``chained assignment `` and should be avoided.
306
- See :ref: `Returning a View versus Copy <indexing.view_versus_copy >`.
307
-
308
295
.. warning ::
309
296
310
297
``.loc `` is strict when you present slicers that are not compatible (or convertible) with the index type. For example
@@ -445,12 +432,6 @@ For more information about duplicate labels, see
445
432
Selection by position
446
433
---------------------
447
434
448
- .. warning ::
449
-
450
- Whether a copy or a reference is returned for a setting operation, may depend on the context.
451
- This is sometimes called ``chained assignment `` and should be avoided.
452
- See :ref: `Returning a View versus Copy <indexing.view_versus_copy >`.
453
-
454
435
pandas provides a suite of methods in order to get **purely integer based indexing **. The semantics follow closely Python and NumPy slicing. These are ``0-based `` indexing. When slicing, the start bound is *included *, while the upper bound is *excluded *. Trying to use a non-integer, even a **valid ** label will raise an ``IndexError ``.
455
436
456
437
The ``.iloc `` attribute is the primary access method. The following are valid inputs:
@@ -1722,234 +1703,10 @@ You can assign a custom index to the ``index`` attribute:
1722
1703
df_idx.index = pd.Index([10 , 20 , 30 , 40 ], name = " a" )
1723
1704
df_idx
1724
1705
1725
- .. _indexing.view_versus_copy :
1726
-
1727
- Returning a view versus a copy
1728
- ------------------------------
1729
-
1730
- .. warning ::
1731
-
1732
- :ref: `Copy-on-Write <copy_on_write >`
1733
- will become the new default in pandas 3.0. This means that chained indexing will
1734
- never work. As a consequence, the ``SettingWithCopyWarning `` won't be necessary
1735
- anymore.
1736
- See :ref: `this section <copy_on_write_chained_assignment >`
1737
- for more context.
1738
- We recommend turning Copy-on-Write on to leverage the improvements with
1739
-
1740
- ```
1741
- pd.options.mode.copy_on_write = True
1742
- ` ``
1743
-
1744
- even before pandas 3.0 is available.
1745
-
1746
- When setting values in a pandas object, care must be taken to avoid what is called
1747
- ``chained indexing ``. Here is an example.
1748
-
1749
- .. ipython :: python
1750
-
1751
- dfmi = pd.DataFrame([list (' abcd' ),
1752
- list (' efgh' ),
1753
- list (' ijkl' ),
1754
- list (' mnop' )],
1755
- columns = pd.MultiIndex.from_product([[' one' , ' two' ],
1756
- [' first' , ' second' ]]))
1757
- dfmi
1758
-
1759
- Compare these two access methods:
1760
-
1761
- .. ipython :: python
1762
-
1763
- dfmi[' one' ][' second' ]
1764
-
1765
- .. ipython :: python
1766
-
1767
- dfmi.loc[:, (' one' , ' second' )]
1768
-
1769
- These both yield the same results, so which should you use? It is instructive to understand the order
1770
- of operations on these and why method 2 (``.loc ``) is much preferred over method 1 (chained ``[] ``).
1771
-
1772
- ``dfmi['one'] `` selects the first level of the columns and returns a DataFrame that is singly-indexed.
1773
- Then another Python operation ``dfmi_with_one['second'] `` selects the series indexed by ``'second' ``.
1774
- This is indicated by the variable ``dfmi_with_one `` because pandas sees these operations as separate events.
1775
- e.g. separate calls to ``__getitem__ ``, so it has to treat them as linear operations, they happen one after another.
1776
-
1777
- Contrast this to ``df.loc[:,('one','second')] `` which passes a nested tuple of ``(slice(None),('one','second')) `` to a single call to
1778
- ``__getitem__ ``. This allows pandas to deal with this as a single entity. Furthermore this order of operations *can * be significantly
1779
- faster, and allows one to index *both * axes if so desired.
1780
-
1781
1706
Why does assignment fail when using chained indexing?
1782
1707
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1783
1708
1784
- .. warning ::
1785
-
1786
- :ref: `Copy-on-Write <copy_on_write >`
1787
- will become the new default in pandas 3.0. This means that chained indexing will
1788
- never work. As a consequence, the ``SettingWithCopyWarning `` won't be necessary
1789
- anymore.
1790
- See :ref: `this section <copy_on_write_chained_assignment >`
1791
- for more context.
1792
- We recommend turning Copy-on-Write on to leverage the improvements with
1793
-
1794
- ```
1795
- pd.options.mode.copy_on_write = True
1796
- ` ``
1797
-
1798
- even before pandas 3.0 is available.
1799
-
1800
- The problem in the previous section is just a performance issue. What's up with
1801
- the ``SettingWithCopy `` warning? We don't **usually ** throw warnings around when
1802
- you do something that might cost a few extra milliseconds!
1803
-
1804
- But it turns out that assigning to the product of chained indexing has
1805
- inherently unpredictable results. To see this, think about how the Python
1806
- interpreter executes this code:
1807
-
1808
- .. code-block :: python
1809
-
1810
- dfmi.loc[:, (' one' , ' second' )] = value
1811
- # becomes
1812
- dfmi.loc.__setitem__ ((slice (None ), (' one' , ' second' )), value)
1813
-
1814
- But this code is handled differently:
1815
-
1816
- .. code-block :: python
1817
-
1818
- dfmi[' one' ][' second' ] = value
1819
- # becomes
1820
- dfmi.__getitem__ (' one' ).__setitem__ (' second' , value)
1821
-
1822
- See that ``__getitem__ `` in there? Outside of simple cases, it's very hard to
1823
- predict whether it will return a view or a copy (it depends on the memory layout
1824
- of the array, about which pandas makes no guarantees), and therefore whether
1825
- the ``__setitem__ `` will modify ``dfmi `` or a temporary object that gets thrown
1826
- out immediately afterward. **That's ** what ``SettingWithCopy `` is warning you
1827
- about!
1828
-
1829
- .. note :: You may be wondering whether we should be concerned about the ``loc``
1830
- property in the first example. But ``dfmi.loc `` is guaranteed to be ``dfmi ``
1831
- itself with modified indexing behavior, so ``dfmi.loc.__getitem__ `` /
1832
- ``dfmi.loc.__setitem__ `` operate on ``dfmi `` directly. Of course,
1833
- ``dfmi.loc.__getitem__(idx) `` may be a view or a copy of ``dfmi ``.
1834
-
1835
- Sometimes a ``SettingWithCopy `` warning will arise at times when there's no
1836
- obvious chained indexing going on. **These ** are the bugs that
1837
- ``SettingWithCopy `` is designed to catch! pandas is probably trying to warn you
1838
- that you've done this:
1839
-
1840
- .. code-block :: python
1841
-
1842
- def do_something (df ):
1843
- foo = df[[' bar' , ' baz' ]] # Is foo a view? A copy? Nobody knows!
1844
- # ... many lines here ...
1845
- # We don't know whether this will modify df or not!
1846
- foo[' quux' ] = value
1847
- return foo
1848
-
1849
- Yikes!
1850
-
1851
- .. _indexing.evaluation_order :
1852
-
1853
- Evaluation order matters
1854
- ~~~~~~~~~~~~~~~~~~~~~~~~
1855
-
1856
- .. warning ::
1857
-
1858
- :ref: `Copy-on-Write <copy_on_write >`
1859
- will become the new default in pandas 3.0. This means than chained indexing will
1860
- never work. As a consequence, the ``SettingWithCopyWarning `` won't be necessary
1861
- anymore.
1862
- See :ref: `this section <copy_on_write_chained_assignment >`
1863
- for more context.
1864
- We recommend turning Copy-on-Write on to leverage the improvements with
1865
-
1866
- ```
1867
- pd.options.mode.copy_on_write = True
1868
- ` ``
1869
-
1870
- even before pandas 3.0 is available.
1871
-
1872
- When you use chained indexing, the order and type of the indexing operation
1873
- partially determine whether the result is a slice into the original object, or
1874
- a copy of the slice.
1875
-
1876
- pandas has the ``SettingWithCopyWarning `` because assigning to a copy of a
1877
- slice is frequently not intentional, but a mistake caused by chained indexing
1878
- returning a copy where a slice was expected.
1879
-
1880
- If you would like pandas to be more or less trusting about assignment to a
1881
- chained indexing expression, you can set the :ref: `option <options >`
1882
- ``mode.chained_assignment `` to one of these values:
1883
-
1884
- * ``'warn' ``, the default, means a ``SettingWithCopyWarning `` is printed.
1885
- * ``'raise' `` means pandas will raise a ``SettingWithCopyError ``
1886
- you have to deal with.
1887
- * ``None `` will suppress the warnings entirely.
1888
-
1889
- .. ipython :: python
1890
- :okwarning:
1891
-
1892
- dfb = pd.DataFrame({' a' : [' one' , ' one' , ' two' ,
1893
- ' three' , ' two' , ' one' , ' six' ],
1894
- ' c' : np.arange(7 )})
1895
-
1896
- # This will show the SettingWithCopyWarning
1897
- # but the frame values will be set
1898
- dfb[' c' ][dfb[' a' ].str.startswith(' o' )] = 42
1899
-
1900
- This however is operating on a copy and will not work.
1901
-
1902
- .. ipython :: python
1903
- :okwarning:
1904
- :okexcept:
1905
-
1906
- with pd.option_context(' mode.chained_assignment' ,' warn' ):
1907
- dfb[dfb[' a' ].str.startswith(' o' )][' c' ] = 42
1908
-
1909
- A chained assignment can also crop up in setting in a mixed dtype frame.
1910
-
1911
- .. note ::
1912
-
1913
- These setting rules apply to all of ``.loc/.iloc ``.
1914
-
1915
- The following is the recommended access method using ``.loc `` for multiple items (using ``mask ``) and a single item using a fixed index:
1916
-
1917
- .. ipython :: python
1918
-
1919
- dfc = pd.DataFrame({' a' : [' one' , ' one' , ' two' ,
1920
- ' three' , ' two' , ' one' , ' six' ],
1921
- ' c' : np.arange(7 )})
1922
- dfd = dfc.copy()
1923
- # Setting multiple items using a mask
1924
- mask = dfd[' a' ].str.startswith(' o' )
1925
- dfd.loc[mask, ' c' ] = 42
1926
- dfd
1927
-
1928
- # Setting a single item
1929
- dfd = dfc.copy()
1930
- dfd.loc[2 , ' a' ] = 11
1931
- dfd
1932
-
1933
- The following *can * work at times, but it is not guaranteed to, and therefore should be avoided:
1934
-
1935
- .. ipython :: python
1936
- :okwarning:
1937
-
1938
- dfd = dfc.copy()
1939
- dfd[' a' ][2 ] = 111
1940
- dfd
1941
-
1942
- Last, the subsequent example will **not ** work at all, and so should be avoided:
1943
-
1944
- .. ipython :: python
1945
- :okwarning:
1946
- :okexcept:
1947
-
1948
- with pd.option_context(' mode.chained_assignment' ,' raise' ):
1949
- dfd.loc[0 ][' a' ] = 1111
1950
-
1951
- .. warning ::
1952
-
1953
- The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid
1954
- assignment. There may be false positives; situations where a chained assignment is inadvertently
1955
- reported.
1709
+ :ref: `Copy-on-Write <copy_on_write >` is the new default with pandas 3.0.
1710
+ This means than chained indexing will never work.
1711
+ See :ref: `this section <copy_on_write_chained_assignment >`
1712
+ for more context.
0 commit comments