Skip to content

Commit bc58fe5

Browse files
authored
DEPR: Remove SettingWithCopyWarning (#56614)
* DEPR: Remove SettingWithCopyWarning * Fixup * Remove docs * CoW: Boolean indexer in MultiIndex raising read-only error * Update * Update * Update
1 parent 36d454a commit bc58fe5

36 files changed

+209
-1353
lines changed

Diff for: doc/source/reference/testing.rst

-2
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,6 @@ Exceptions and warnings
5858
errors.PossiblePrecisionLoss
5959
errors.PyperclipException
6060
errors.PyperclipWindowsException
61-
errors.SettingWithCopyError
62-
errors.SettingWithCopyWarning
6361
errors.SpecificationError
6462
errors.UndefinedVariableError
6563
errors.UnsortedIndexError

Diff for: doc/source/user_guide/advanced.rst

+1-7
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,6 @@ and :ref:`other advanced indexing features <advanced.index_types>`.
1111

1212
See the :ref:`Indexing and Selecting Data <indexing>` for general indexing documentation.
1313

14-
.. warning::
15-
16-
Whether a copy or a reference is returned for a setting operation may
17-
depend on the context. This is sometimes called ``chained assignment`` and
18-
should be avoided. See :ref:`Returning a View versus Copy
19-
<indexing.view_versus_copy>`.
20-
2114
See the :ref:`cookbook<cookbook.selection>` for some advanced strategies.
2215

2316
.. _advanced.hierarchical:
@@ -402,6 +395,7 @@ slicers on a single axis.
402395
Furthermore, you can *set* the values using the following methods.
403396

404397
.. ipython:: python
398+
:okwarning:
405399
406400
df2 = dfmi.copy()
407401
df2.loc(axis=0)[:, :, ["C1", "C3"]] = -10

Diff for: doc/source/user_guide/indexing.rst

+4-247
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,6 @@ this area.
2929
production code, we recommended that you take advantage of the optimized
3030
pandas data access methods exposed in this chapter.
3131

32-
.. warning::
33-
34-
Whether a copy or a reference is returned for a setting operation, may
35-
depend on the context. This is sometimes called ``chained assignment`` and
36-
should be avoided. See :ref:`Returning a View versus Copy
37-
<indexing.view_versus_copy>`.
38-
3932
See the :ref:`MultiIndex / Advanced Indexing <advanced>` for ``MultiIndex`` and more advanced indexing documentation.
4033

4134
See the :ref:`cookbook<cookbook.selection>` for some advanced strategies.
@@ -299,12 +292,6 @@ largely as a convenience since it is such a common operation.
299292
Selection by label
300293
------------------
301294

302-
.. warning::
303-
304-
Whether a copy or a reference is returned for a setting operation, may depend on the context.
305-
This is sometimes called ``chained assignment`` and should be avoided.
306-
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`.
307-
308295
.. warning::
309296

310297
``.loc`` is strict when you present slicers that are not compatible (or convertible) with the index type. For example
@@ -445,12 +432,6 @@ For more information about duplicate labels, see
445432
Selection by position
446433
---------------------
447434

448-
.. warning::
449-
450-
Whether a copy or a reference is returned for a setting operation, may depend on the context.
451-
This is sometimes called ``chained assignment`` and should be avoided.
452-
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`.
453-
454435
pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely Python and NumPy slicing. These are ``0-based`` indexing. When slicing, the start bound is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise an ``IndexError``.
455436

456437
The ``.iloc`` attribute is the primary access method. The following are valid inputs:
@@ -1722,234 +1703,10 @@ You can assign a custom index to the ``index`` attribute:
17221703
df_idx.index = pd.Index([10, 20, 30, 40], name="a")
17231704
df_idx
17241705
1725-
.. _indexing.view_versus_copy:
1726-
1727-
Returning a view versus a copy
1728-
------------------------------
1729-
1730-
.. warning::
1731-
1732-
:ref:`Copy-on-Write <copy_on_write>`
1733-
will become the new default in pandas 3.0. This means that chained indexing will
1734-
never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary
1735-
anymore.
1736-
See :ref:`this section <copy_on_write_chained_assignment>`
1737-
for more context.
1738-
We recommend turning Copy-on-Write on to leverage the improvements with
1739-
1740-
```
1741-
pd.options.mode.copy_on_write = True
1742-
```
1743-
1744-
even before pandas 3.0 is available.
1745-
1746-
When setting values in a pandas object, care must be taken to avoid what is called
1747-
``chained indexing``. Here is an example.
1748-
1749-
.. ipython:: python
1750-
1751-
dfmi = pd.DataFrame([list('abcd'),
1752-
list('efgh'),
1753-
list('ijkl'),
1754-
list('mnop')],
1755-
columns=pd.MultiIndex.from_product([['one', 'two'],
1756-
['first', 'second']]))
1757-
dfmi
1758-
1759-
Compare these two access methods:
1760-
1761-
.. ipython:: python
1762-
1763-
dfmi['one']['second']
1764-
1765-
.. ipython:: python
1766-
1767-
dfmi.loc[:, ('one', 'second')]
1768-
1769-
These both yield the same results, so which should you use? It is instructive to understand the order
1770-
of operations on these and why method 2 (``.loc``) is much preferred over method 1 (chained ``[]``).
1771-
1772-
``dfmi['one']`` selects the first level of the columns and returns a DataFrame that is singly-indexed.
1773-
Then another Python operation ``dfmi_with_one['second']`` selects the series indexed by ``'second'``.
1774-
This is indicated by the variable ``dfmi_with_one`` because pandas sees these operations as separate events.
1775-
e.g. separate calls to ``__getitem__``, so it has to treat them as linear operations, they happen one after another.
1776-
1777-
Contrast this to ``df.loc[:,('one','second')]`` which passes a nested tuple of ``(slice(None),('one','second'))`` to a single call to
1778-
``__getitem__``. This allows pandas to deal with this as a single entity. Furthermore this order of operations *can* be significantly
1779-
faster, and allows one to index *both* axes if so desired.
1780-
17811706
Why does assignment fail when using chained indexing?
17821707
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17831708

1784-
.. warning::
1785-
1786-
:ref:`Copy-on-Write <copy_on_write>`
1787-
will become the new default in pandas 3.0. This means that chained indexing will
1788-
never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary
1789-
anymore.
1790-
See :ref:`this section <copy_on_write_chained_assignment>`
1791-
for more context.
1792-
We recommend turning Copy-on-Write on to leverage the improvements with
1793-
1794-
```
1795-
pd.options.mode.copy_on_write = True
1796-
```
1797-
1798-
even before pandas 3.0 is available.
1799-
1800-
The problem in the previous section is just a performance issue. What's up with
1801-
the ``SettingWithCopy`` warning? We don't **usually** throw warnings around when
1802-
you do something that might cost a few extra milliseconds!
1803-
1804-
But it turns out that assigning to the product of chained indexing has
1805-
inherently unpredictable results. To see this, think about how the Python
1806-
interpreter executes this code:
1807-
1808-
.. code-block:: python
1809-
1810-
dfmi.loc[:, ('one', 'second')] = value
1811-
# becomes
1812-
dfmi.loc.__setitem__((slice(None), ('one', 'second')), value)
1813-
1814-
But this code is handled differently:
1815-
1816-
.. code-block:: python
1817-
1818-
dfmi['one']['second'] = value
1819-
# becomes
1820-
dfmi.__getitem__('one').__setitem__('second', value)
1821-
1822-
See that ``__getitem__`` in there? Outside of simple cases, it's very hard to
1823-
predict whether it will return a view or a copy (it depends on the memory layout
1824-
of the array, about which pandas makes no guarantees), and therefore whether
1825-
the ``__setitem__`` will modify ``dfmi`` or a temporary object that gets thrown
1826-
out immediately afterward. **That's** what ``SettingWithCopy`` is warning you
1827-
about!
1828-
1829-
.. note:: You may be wondering whether we should be concerned about the ``loc``
1830-
property in the first example. But ``dfmi.loc`` is guaranteed to be ``dfmi``
1831-
itself with modified indexing behavior, so ``dfmi.loc.__getitem__`` /
1832-
``dfmi.loc.__setitem__`` operate on ``dfmi`` directly. Of course,
1833-
``dfmi.loc.__getitem__(idx)`` may be a view or a copy of ``dfmi``.
1834-
1835-
Sometimes a ``SettingWithCopy`` warning will arise at times when there's no
1836-
obvious chained indexing going on. **These** are the bugs that
1837-
``SettingWithCopy`` is designed to catch! pandas is probably trying to warn you
1838-
that you've done this:
1839-
1840-
.. code-block:: python
1841-
1842-
def do_something(df):
1843-
foo = df[['bar', 'baz']] # Is foo a view? A copy? Nobody knows!
1844-
# ... many lines here ...
1845-
# We don't know whether this will modify df or not!
1846-
foo['quux'] = value
1847-
return foo
1848-
1849-
Yikes!
1850-
1851-
.. _indexing.evaluation_order:
1852-
1853-
Evaluation order matters
1854-
~~~~~~~~~~~~~~~~~~~~~~~~
1855-
1856-
.. warning::
1857-
1858-
:ref:`Copy-on-Write <copy_on_write>`
1859-
will become the new default in pandas 3.0. This means than chained indexing will
1860-
never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary
1861-
anymore.
1862-
See :ref:`this section <copy_on_write_chained_assignment>`
1863-
for more context.
1864-
We recommend turning Copy-on-Write on to leverage the improvements with
1865-
1866-
```
1867-
pd.options.mode.copy_on_write = True
1868-
```
1869-
1870-
even before pandas 3.0 is available.
1871-
1872-
When you use chained indexing, the order and type of the indexing operation
1873-
partially determine whether the result is a slice into the original object, or
1874-
a copy of the slice.
1875-
1876-
pandas has the ``SettingWithCopyWarning`` because assigning to a copy of a
1877-
slice is frequently not intentional, but a mistake caused by chained indexing
1878-
returning a copy where a slice was expected.
1879-
1880-
If you would like pandas to be more or less trusting about assignment to a
1881-
chained indexing expression, you can set the :ref:`option <options>`
1882-
``mode.chained_assignment`` to one of these values:
1883-
1884-
* ``'warn'``, the default, means a ``SettingWithCopyWarning`` is printed.
1885-
* ``'raise'`` means pandas will raise a ``SettingWithCopyError``
1886-
you have to deal with.
1887-
* ``None`` will suppress the warnings entirely.
1888-
1889-
.. ipython:: python
1890-
:okwarning:
1891-
1892-
dfb = pd.DataFrame({'a': ['one', 'one', 'two',
1893-
'three', 'two', 'one', 'six'],
1894-
'c': np.arange(7)})
1895-
1896-
# This will show the SettingWithCopyWarning
1897-
# but the frame values will be set
1898-
dfb['c'][dfb['a'].str.startswith('o')] = 42
1899-
1900-
This however is operating on a copy and will not work.
1901-
1902-
.. ipython:: python
1903-
:okwarning:
1904-
:okexcept:
1905-
1906-
with pd.option_context('mode.chained_assignment','warn'):
1907-
dfb[dfb['a'].str.startswith('o')]['c'] = 42
1908-
1909-
A chained assignment can also crop up in setting in a mixed dtype frame.
1910-
1911-
.. note::
1912-
1913-
These setting rules apply to all of ``.loc/.iloc``.
1914-
1915-
The following is the recommended access method using ``.loc`` for multiple items (using ``mask``) and a single item using a fixed index:
1916-
1917-
.. ipython:: python
1918-
1919-
dfc = pd.DataFrame({'a': ['one', 'one', 'two',
1920-
'three', 'two', 'one', 'six'],
1921-
'c': np.arange(7)})
1922-
dfd = dfc.copy()
1923-
# Setting multiple items using a mask
1924-
mask = dfd['a'].str.startswith('o')
1925-
dfd.loc[mask, 'c'] = 42
1926-
dfd
1927-
1928-
# Setting a single item
1929-
dfd = dfc.copy()
1930-
dfd.loc[2, 'a'] = 11
1931-
dfd
1932-
1933-
The following *can* work at times, but it is not guaranteed to, and therefore should be avoided:
1934-
1935-
.. ipython:: python
1936-
:okwarning:
1937-
1938-
dfd = dfc.copy()
1939-
dfd['a'][2] = 111
1940-
dfd
1941-
1942-
Last, the subsequent example will **not** work at all, and so should be avoided:
1943-
1944-
.. ipython:: python
1945-
:okwarning:
1946-
:okexcept:
1947-
1948-
with pd.option_context('mode.chained_assignment','raise'):
1949-
dfd.loc[0]['a'] = 1111
1950-
1951-
.. warning::
1952-
1953-
The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid
1954-
assignment. There may be false positives; situations where a chained assignment is inadvertently
1955-
reported.
1709+
:ref:`Copy-on-Write <copy_on_write>` is the new default with pandas 3.0.
1710+
This means than chained indexing will never work.
1711+
See :ref:`this section <copy_on_write_chained_assignment>`
1712+
for more context.

Diff for: doc/source/whatsnew/v0.13.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,7 @@ API changes
172172
statistical mode(s) by axis/Series. (:issue:`5367`)
173173

174174
- Chained assignment will now by default warn if the user is assigning to a copy. This can be changed
175-
with the option ``mode.chained_assignment``, allowed options are ``raise/warn/None``. See :ref:`the docs<indexing.view_versus_copy>`.
175+
with the option ``mode.chained_assignment``, allowed options are ``raise/warn/None``.
176176

177177
.. ipython:: python
178178

Diff for: doc/source/whatsnew/v0.13.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ Highlights include:
2424
.. warning::
2525

2626
0.13.1 fixes a bug that was caused by a combination of having numpy < 1.8, and doing
27-
chained assignment on a string-like array. Please review :ref:`the docs<indexing.view_versus_copy>`,
28-
chained indexing can have unexpected results and should generally be avoided.
27+
chained assignment on a string-like array.
28+
Chained indexing can have unexpected results and should generally be avoided.
2929

3030
This would previously segfault:
3131

Diff for: doc/source/whatsnew/v1.5.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -383,7 +383,7 @@ Other enhancements
383383
- Added ``validate`` argument to :meth:`DataFrame.join` (:issue:`46622`)
384384
- Added ``numeric_only`` argument to :meth:`.Resampler.sum`, :meth:`.Resampler.prod`, :meth:`.Resampler.min`, :meth:`.Resampler.max`, :meth:`.Resampler.first`, and :meth:`.Resampler.last` (:issue:`46442`)
385385
- ``times`` argument in :class:`.ExponentialMovingWindow` now accepts ``np.timedelta64`` (:issue:`47003`)
386-
- :class:`.DataError`, :class:`.SpecificationError`, :class:`.SettingWithCopyError`, :class:`.SettingWithCopyWarning`, :class:`.NumExprClobberingError`, :class:`.UndefinedVariableError`, :class:`.IndexingError`, :class:`.PyperclipException`, :class:`.PyperclipWindowsException`, :class:`.CSSWarning`, :class:`.PossibleDataLossError`, :class:`.ClosedFileError`, :class:`.IncompatibilityWarning`, :class:`.AttributeConflictWarning`, :class:`.DatabaseError`, :class:`.PossiblePrecisionLoss`, :class:`.ValueLabelTypeMismatch`, :class:`.InvalidColumnName`, and :class:`.CategoricalConversionWarning` are now exposed in ``pandas.errors`` (:issue:`27656`)
386+
- :class:`.DataError`, :class:`.SpecificationError`, ``SettingWithCopyError``, ``SettingWithCopyWarning``, :class:`.NumExprClobberingError`, :class:`.UndefinedVariableError`, :class:`.IndexingError`, :class:`.PyperclipException`, :class:`.PyperclipWindowsException`, :class:`.CSSWarning`, :class:`.PossibleDataLossError`, :class:`.ClosedFileError`, :class:`.IncompatibilityWarning`, :class:`.AttributeConflictWarning`, :class:`.DatabaseError`, :class:`.PossiblePrecisionLoss`, :class:`.ValueLabelTypeMismatch`, :class:`.InvalidColumnName`, and :class:`.CategoricalConversionWarning` are now exposed in ``pandas.errors`` (:issue:`27656`)
387387
- Added ``check_like`` argument to :func:`testing.assert_series_equal` (:issue:`47247`)
388388
- Add support for :meth:`.DataFrameGroupBy.ohlc` and :meth:`.SeriesGroupBy.ohlc` for extension array dtypes (:issue:`37493`)
389389
- Allow reading compressed SAS files with :func:`read_sas` (e.g., ``.sas7bdat.gz`` files)

Diff for: pandas/_config/__init__.py

-5
Original file line numberDiff line numberDiff line change
@@ -34,11 +34,6 @@ def using_copy_on_write() -> bool:
3434
return True
3535

3636

37-
def using_nullable_dtypes() -> bool:
38-
_mode_options = _global_config["mode"]
39-
return _mode_options["nullable_dtypes"]
40-
41-
4237
def using_pyarrow_string_dtype() -> bool:
4338
_mode_options = _global_config["future"]
4439
return _mode_options["infer_string"]

Diff for: pandas/core/apply.py

+6-10
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,6 @@
1616

1717
import numpy as np
1818

19-
from pandas._config import option_context
20-
2119
from pandas._libs import lib
2220
from pandas._libs.internals import BlockValuesRefs
2321
from pandas._typing import (
@@ -1076,14 +1074,12 @@ def apply_series_generator(self) -> tuple[ResType, Index]:
10761074

10771075
results = {}
10781076

1079-
with option_context("mode.chained_assignment", None):
1080-
for i, v in enumerate(series_gen):
1081-
# ignore SettingWithCopy here in case the user mutates
1082-
results[i] = self.func(v, *self.args, **self.kwargs)
1083-
if isinstance(results[i], ABCSeries):
1084-
# If we have a view on v, we need to make a copy because
1085-
# series_generator will swap out the underlying data
1086-
results[i] = results[i].copy(deep=False)
1077+
for i, v in enumerate(series_gen):
1078+
results[i] = self.func(v, *self.args, **self.kwargs)
1079+
if isinstance(results[i], ABCSeries):
1080+
# If we have a view on v, we need to make a copy because
1081+
# series_generator will swap out the underlying data
1082+
results[i] = results[i].copy(deep=False)
10871083

10881084
return results, res_index
10891085

0 commit comments

Comments
 (0)