Skip to content

Commit 7aa7880

Browse files
committed
API: raise SettingWithCopy when chained assignment is detected
1 parent 05cb960 commit 7aa7880

12 files changed

+201
-44
lines changed

doc/source/indexing.rst

+38-21
Original file line numberDiff line numberDiff line change
@@ -1330,24 +1330,34 @@ indexing operation, the result will be a copy. With single label / scalar
13301330
indexing and slicing, e.g. ``df.ix[3:6]`` or ``df.ix[:, 'A']``, a view will be
13311331
returned.
13321332
1333-
In chained expressions, the order may determine whether a copy is returned or not:
1333+
In chained expressions, the order may determine whether a copy is returned or not.
1334+
If an expression will set values on a copy of a slice, then a ``SettingWithCopy``
1335+
exception will be raised (this raise/warn behavior is new starting in 0.13.0)
13341336
1335-
.. ipython:: python
1337+
You can control the action of a chained assignment via the option ``mode.chained_assignment``,
1338+
which can take the values ``['raise','warn',None]``, where showing a warning is the default.
13361339
1340+
.. ipython:: python
13371341
13381342
dfb = DataFrame({'a' : ['one', 'one', 'two',
13391343
'three', 'two', 'one', 'six'],
1340-
'b' : ['x', 'y', 'y',
1341-
'x', 'y', 'x', 'x'],
1342-
'c' : randn(7)})
1343-
1344-
1345-
# goes to copy (will be lost)
1346-
dfb[dfb.a.str.startswith('o')]['c'] = 42
1344+
'c' : np.arange(7)})
13471345
13481346
# passed via reference (will stay)
13491347
dfb['c'][dfb.a.str.startswith('o')] = 42
13501348
1349+
This however is operating on a copy and will not work.
1350+
1351+
::
1352+
1353+
>>> pd.set_option('mode.chained_assignment','warn')
1354+
>>> dfb[dfb.a.str.startswith('o')]['c'] = 42
1355+
Traceback (most recent call last)
1356+
...
1357+
SettingWithCopyWarning:
1358+
A value is trying to be set on a copy of a slice from a DataFrame.
1359+
Try using .loc[row_index,col_indexer] = value instead
1360+
13511361
A chained assignment can also crop up in setting in a mixed dtype frame.
13521362
13531363
.. note::
@@ -1359,28 +1369,35 @@ This is the correct access method
13591369
.. ipython:: python
13601370
13611371
dfc = DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})
1362-
dfc_copy = dfc.copy()
1363-
dfc_copy.loc[0,'A'] = 11
1364-
dfc_copy
1372+
dfc.loc[0,'A'] = 11
1373+
dfc
13651374
13661375
This *can* work at times, but is not guaranteed, and so should be avoided
13671376
13681377
.. ipython:: python
13691378
1370-
dfc_copy = dfc.copy()
1371-
dfc_copy['A'][0] = 111
1372-
dfc_copy
1379+
dfc = dfc.copy()
1380+
dfc['A'][0] = 111
1381+
dfc
13731382
13741383
This will **not** work at all, and so should be avoided
13751384
1376-
.. ipython:: python
1385+
::
1386+
1387+
>>> pd.set_option('mode.chained_assignment','raise')
1388+
>>> dfc.loc[0]['A'] = 1111
1389+
Traceback (most recent call last)
1390+
...
1391+
SettingWithCopyException:
1392+
A value is trying to be set on a copy of a slice from a DataFrame.
1393+
Try using .loc[row_index,col_indexer] = value instead
1394+
1395+
.. warning::
13771396
1378-
dfc_copy = dfc.copy()
1379-
dfc_copy.loc[0]['A'] = 1111
1380-
dfc_copy
1397+
The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid
1398+
assignment. There may be false positives; situations where a chained assignment is inadvertantly
1399+
reported.
13811400
1382-
When assigning values to subsets of your data, thus, make sure to either use the
1383-
pandas access methods or explicitly handle the assignment creating a copy.
13841401
13851402
Fallback indexing
13861403
-----------------

doc/source/release.rst

+3
Original file line numberDiff line numberDiff line change
@@ -396,6 +396,9 @@ API Changes
396396
3 4.000000
397397
dtype: float64
398398
399+
- raise/warn ``SettingWithCopyError/Warning`` exception/warning when setting of a
400+
copy thru chained assignment is detected, settable via option ``mode.chained_assignment``
401+
399402
Internal Refactoring
400403
~~~~~~~~~~~~~~~~~~~~
401404

doc/source/v0.13.0.txt

+28
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,34 @@ API changes
104104
- ``Series`` and ``DataFrame`` now have a ``mode()`` method to calculate the
105105
statistical mode(s) by axis/Series. (:issue:`5367`)
106106

107+
- Chained assignment will now by default warn if the user is assigning to a copy. This can be changed
108+
with he option ``mode.chained_assignment``, allowed options are ``raise/warn/None``. See :ref:`the docs<indexing.view_versus_copy>`.
109+
110+
.. ipython:: python
111+
112+
dfc = DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})
113+
pd.set_option('chained_assignment','warn')
114+
115+
The following warning / exception will show if this is attempted.
116+
117+
.. ipython:: python
118+
119+
dfc.loc[0]['A'] = 1111
120+
121+
::
122+
123+
Traceback (most recent call last)
124+
...
125+
SettingWithCopyWarning:
126+
A value is trying to be set on a copy of a slice from a DataFrame.
127+
Try using .loc[row_index,col_indexer] = value instead
128+
129+
Here is the correct method of assignment.
130+
131+
.. ipython:: python
132+
133+
dfc.loc[0,'A'] = 11
134+
dfc
107135

108136
Prior Version Deprecations/Changes
109137
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

pandas/core/common.py

+5
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,11 @@
2626
class PandasError(Exception):
2727
pass
2828

29+
class SettingWithCopyError(ValueError):
30+
pass
31+
32+
class SettingWithCopyWarning(Warning):
33+
pass
2934

3035
class AmbiguousIndexError(PandasError, KeyError):
3136
pass

pandas/core/config.py

+7
Original file line numberDiff line numberDiff line change
@@ -512,6 +512,13 @@ def _get_root(key):
512512
cursor = cursor[p]
513513
return cursor, path[-1]
514514

515+
def _get_option_fast(key):
516+
""" internal quick access routine, no error checking """
517+
path = key.split('.')
518+
cursor = _global_config
519+
for p in path:
520+
cursor = cursor[p]
521+
return cursor
515522

516523
def _is_deprecated(key):
517524
""" Returns True if the given option has been deprecated """

pandas/core/config_init.py

+11-1
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,6 @@ def mpl_style_cb(key):
271271
# We don't want to start importing everything at the global context level
272272
# or we'll hit circular deps.
273273

274-
275274
def use_inf_as_null_cb(key):
276275
from pandas.core.common import _use_inf_as_null
277276
_use_inf_as_null(key)
@@ -281,6 +280,17 @@ def use_inf_as_null_cb(key):
281280
cb=use_inf_as_null_cb)
282281

283282

283+
# user warnings
284+
chained_assignment = """
285+
: string
286+
Raise an exception, warn, or no action if trying to use chained assignment, The default is warn
287+
"""
288+
289+
with cf.config_prefix('mode'):
290+
cf.register_option('chained_assignment', 'warn', chained_assignment,
291+
validator=is_one_of_factory([None, 'warn', 'raise']))
292+
293+
284294
# Set up the io.excel specific configuration.
285295
writer_engine_doc = """
286296
: string

pandas/core/frame.py

+17-9
Original file line numberDiff line numberDiff line change
@@ -1547,12 +1547,9 @@ def _ixs(self, i, axis=0, copy=False):
15471547
i = _maybe_convert_indices(i, len(self._get_axis(axis)))
15481548
return self.reindex(i, takeable=True)
15491549
else:
1550-
try:
1551-
new_values = self._data.fast_2d_xs(i, copy=copy)
1552-
except:
1553-
new_values = self._data.fast_2d_xs(i, copy=True)
1550+
new_values, copy = self._data.fast_2d_xs(i, copy=copy)
15541551
return Series(new_values, index=self.columns,
1555-
name=self.index[i])
1552+
name=self.index[i])._setitem_copy(copy)
15561553

15571554
# icol
15581555
else:
@@ -1892,10 +1889,18 @@ def _set_item(self, key, value):
18921889
Series/TimeSeries will be conformed to the DataFrame's index to
18931890
ensure homogeneity.
18941891
"""
1892+
1893+
is_existing = key in self.columns
18951894
self._ensure_valid_index(value)
18961895
value = self._sanitize_column(key, value)
18971896
NDFrame._set_item(self, key, value)
18981897

1898+
# check if we are modifying a copy
1899+
# try to set first as we want an invalid
1900+
# value exeption to occur first
1901+
if is_existing:
1902+
self._check_setitem_copy()
1903+
18991904
def insert(self, loc, column, value, allow_duplicates=False):
19001905
"""
19011906
Insert column into DataFrame at specified location.
@@ -2093,13 +2098,16 @@ def xs(self, key, axis=0, level=None, copy=True, drop_level=True):
20932098
new_index = self.index[loc]
20942099

20952100
if np.isscalar(loc):
2096-
new_values = self._data.fast_2d_xs(loc, copy=copy)
2097-
return Series(new_values, index=self.columns,
2098-
name=self.index[loc])
2101+
2102+
new_values, copy = self._data.fast_2d_xs(loc, copy=copy)
2103+
result = Series(new_values, index=self.columns,
2104+
name=self.index[loc])._setitem_copy(copy)
2105+
20992106
else:
21002107
result = self[loc]
21012108
result.index = new_index
2102-
return result
2109+
2110+
return result
21032111

21042112
_xs = xs
21052113

pandas/core/generic.py

+22-3
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,11 @@
1919
from pandas import compat, _np_version_under1p7
2020
from pandas.compat import map, zip, lrange, string_types, isidentifier
2121
from pandas.core.common import (isnull, notnull, is_list_like,
22-
_values_from_object, _maybe_promote, ABCSeries)
22+
_values_from_object, _maybe_promote, ABCSeries,
23+
SettingWithCopyError, SettingWithCopyWarning)
2324
import pandas.core.nanops as nanops
2425
from pandas.util.decorators import Appender, Substitution
26+
from pandas.core import config
2527

2628
# goal is to be able to define the docs close to function, while still being
2729
# able to share
@@ -69,7 +71,7 @@ class NDFrame(PandasObject):
6971
copy : boolean, default False
7072
"""
7173
_internal_names = [
72-
'_data', 'name', '_cacher', '_subtyp', '_index', '_default_kind', '_default_fill_value']
74+
'_data', 'name', '_cacher', '_is_copy', '_subtyp', '_index', '_default_kind', '_default_fill_value']
7375
_internal_names_set = set(_internal_names)
7476
_metadata = []
7577

@@ -85,6 +87,7 @@ def __init__(self, data, axes=None, copy=False, dtype=None, fastpath=False):
8587
for i, ax in enumerate(axes):
8688
data = data.reindex_axis(ax, axis=i)
8789

90+
object.__setattr__(self, '_is_copy', False)
8891
object.__setattr__(self, '_data', data)
8992
object.__setattr__(self, '_item_cache', {})
9093

@@ -988,6 +991,22 @@ def _set_item(self, key, value):
988991
self._data.set(key, value)
989992
self._clear_item_cache()
990993

994+
def _setitem_copy(self, copy):
995+
""" set the _is_copy of the iiem """
996+
self._is_copy = copy
997+
return self
998+
999+
def _check_setitem_copy(self):
1000+
""" validate if we are doing a settitem on a chained copy """
1001+
if self._is_copy:
1002+
value = config._get_option_fast('mode.chained_assignment')
1003+
1004+
t = "A value is trying to be set on a copy of a slice from a DataFrame.\nTry using .loc[row_index,col_indexer] = value instead"
1005+
if value == 'raise':
1006+
raise SettingWithCopyError(t)
1007+
elif value == 'warn':
1008+
warnings.warn(t,SettingWithCopyWarning)
1009+
9911010
def __delitem__(self, key):
9921011
"""
9931012
Delete item
@@ -1049,7 +1068,7 @@ def take(self, indices, axis=0, convert=True):
10491068
new_data = self._data.reindex_axis(new_items, indexer=indices, axis=0)
10501069
else:
10511070
new_data = self._data.take(indices, axis=baxis)
1052-
return self._constructor(new_data).__finalize__(self)
1071+
return self._constructor(new_data)._setitem_copy(True).__finalize__(self)
10531072

10541073
# TODO: Check if this was clearer in 0.12
10551074
def select(self, crit, axis=0):

pandas/core/internals.py

+5-7
Original file line numberDiff line numberDiff line change
@@ -2567,22 +2567,20 @@ def fast_2d_xs(self, loc, copy=False):
25672567
"""
25682568
get a cross sectional for a given location in the
25692569
items ; handle dups
2570+
2571+
return the result and a flag if a copy was actually made
25702572
"""
25712573
if len(self.blocks) == 1:
25722574
result = self.blocks[0].values[:, loc]
25732575
if copy:
25742576
result = result.copy()
2575-
return result
2576-
2577-
if not copy:
2578-
raise TypeError('cannot get view of mixed-type or '
2579-
'non-consolidated DataFrame')
2577+
return result, copy
25802578

25812579
items = self.items
25822580

25832581
# non-unique (GH4726)
25842582
if not items.is_unique:
2585-
return self._interleave(items).ravel()
2583+
return self._interleave(items).ravel(), True
25862584

25872585
# unique
25882586
dtype = _interleaved_dtype(self.blocks)
@@ -2593,7 +2591,7 @@ def fast_2d_xs(self, loc, copy=False):
25932591
i = items.get_loc(item)
25942592
result[i] = blk._try_coerce_result(blk.iget((j, loc)))
25952593

2596-
return result
2594+
return result, True
25972595

25982596
def consolidate(self):
25992597
"""

pandas/core/series.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@
2222
_values_from_object,
2323
_possibly_cast_to_datetime, _possibly_castable,
2424
_possibly_convert_platform,
25-
ABCSparseArray, _maybe_match_name, _ensure_object)
25+
ABCSparseArray, _maybe_match_name, _ensure_object,
26+
SettingWithCopyError)
2627

2728
from pandas.core.index import (Index, MultiIndex, InvalidIndexError,
2829
_ensure_index, _handle_legacy_indexes)
@@ -575,6 +576,8 @@ def __setitem__(self, key, value):
575576
try:
576577
self._set_with_engine(key, value)
577578
return
579+
except (SettingWithCopyError):
580+
raise
578581
except (KeyError, ValueError):
579582
values = self.values
580583
if (com.is_integer(key)
@@ -623,6 +626,7 @@ def _set_with_engine(self, key, value):
623626
values = self.values
624627
try:
625628
self.index._engine.set_value(values, key, value)
629+
self._check_setitem_copy()
626630
return
627631
except KeyError:
628632
values[self.index.get_loc(key)] = value

pandas/tests/test_frame.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -11059,9 +11059,11 @@ def test_xs_view(self):
1105911059
dm.xs(2)[:] = 10
1106011060
self.assert_((dm.xs(2) == 5).all())
1106111061

11062+
# prior to chained assignment (GH5390)
11063+
# this would raise, but now just rrens a copy (and sets _is_copy)
1106211064
# TODO (?): deal with mixed-type fiasco?
11063-
with assertRaisesRegexp(TypeError, 'cannot get view of mixed-type'):
11064-
self.mixed_frame.xs(self.mixed_frame.index[2], copy=False)
11065+
# with assertRaisesRegexp(TypeError, 'cannot get view of mixed-type'):
11066+
# self.mixed_frame.xs(self.mixed_frame.index[2], copy=False)
1106511067

1106611068
# unconsolidated
1106711069
dm['foo'] = 6.

0 commit comments

Comments
 (0)