Skip to content

Commit e143ee1

Browse files
deniederhutjowens
authored andcommitted
ENH: Add warning when setting into nonexistent attribute (pandas-dev#16951)
closes pandas-dev#7175 closes pandas-dev#5904
1 parent 94a734a commit e143ee1

File tree

5 files changed

+127
-8
lines changed

5 files changed

+127
-8
lines changed

doc/source/indexing.rst

+31-4
Original file line numberDiff line numberDiff line change
@@ -227,10 +227,6 @@ as an attribute:
227227
dfa.A
228228
panel.one
229229
230-
You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful;
231-
if you try to use attribute access to create a new column, it fails silently, creating a new attribute rather than a
232-
new column.
233-
234230
.. ipython:: python
235231
236232
sa.a = 5
@@ -267,6 +263,37 @@ You can also assign a ``dict`` to a row of a ``DataFrame``:
267263
x.iloc[1] = dict(x=9, y=99)
268264
x
269265
266+
You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful;
267+
if you try to use attribute access to create a new column, it creates a new attribute rather than a
268+
new column. In 0.21.0 and later, this will raise a ``UserWarning``:
269+
270+
.. code-block:: ipython
271+
272+
In[1]: df = pd.DataFrame({'one': [1., 2., 3.]})
273+
In[2]: df.two = [4, 5, 6]
274+
UserWarning: Pandas doesn't allow Series to be assigned into nonexistent columns - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute_access
275+
In[3]: df
276+
Out[3]:
277+
one
278+
0 1.0
279+
1 2.0
280+
2 3.0
281+
282+
Similarly, it is possible to create a column with a name which collides with one of Pandas's
283+
built-in methods or attributes, which can cause confusion later when attempting to access
284+
that column as an attribute. This behavior now warns:
285+
286+
.. code-block:: ipython
287+
288+
In[4]: df['sum'] = [5., 7., 9.]
289+
UserWarning: Column name 'sum' collides with a built-in method, which will cause unexpected attribute behavior
290+
In[5]: df.sum
291+
Out[5]:
292+
<bound method DataFrame.sum of one sum
293+
0 1.0 5.0
294+
1 2.0 7.0
295+
2 3.0 9.0>
296+
270297
Slicing ranges
271298
--------------
272299

doc/source/whatsnew/v0.21.0.txt

+45-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,6 @@ New features
2929
- Added ``skipna`` parameter to :func:`~pandas.api.types.infer_dtype` to
3030
support type inference in the presence of missing values (:issue:`17059`).
3131

32-
3332
.. _whatsnew_0210.enhancements.infer_objects:
3433

3534
``infer_objects`` type conversion
@@ -62,6 +61,51 @@ using the :func:`to_numeric` function (or :func:`to_datetime`, :func:`to_timedel
6261
df['C'] = pd.to_numeric(df['C'], errors='coerce')
6362
df.dtypes
6463

64+
.. _whatsnew_0210.enhancements.attribute_access:
65+
66+
Improved warnings when attempting to create columns
67+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
68+
69+
New users are often flummoxed by the relationship between column operations and attribute
70+
access on ``DataFrame`` instances (:issue:`5904` & :issue:`7175`). Two specific instances
71+
of this confusion include attempting to create a new column by setting into an attribute:
72+
73+
.. code-block:: ipython
74+
75+
In[1]: df = pd.DataFrame({'one': [1., 2., 3.]})
76+
In[2]: df.two = [4, 5, 6]
77+
78+
This does not raise any obvious exceptions, but also does not create a new column:
79+
80+
.. code-block:: ipython
81+
82+
In[3]: df
83+
Out[3]:
84+
one
85+
0 1.0
86+
1 2.0
87+
2 3.0
88+
89+
The second source of confusion is creating a column whose name collides with a method or
90+
attribute already in the instance namespace:
91+
92+
.. code-block:: ipython
93+
94+
In[4]: df['sum'] = [5., 7., 9.]
95+
96+
This does not permit that column to be accessed as an attribute:
97+
98+
.. code-block:: ipython
99+
100+
In[5]: df.sum
101+
Out[5]:
102+
<bound method DataFrame.sum of one sum
103+
0 1.0 5.0
104+
1 2.0 7.0
105+
2 3.0 9.0>
106+
107+
Both of these now raise a ``UserWarning`` about the potential for unexpected behavior. See :ref:`Attribute Access <indexing.attribute_access>`.
108+
65109
.. _whatsnew_0210.enhancements.other:
66110

67111
Other Enhancements

pandas/core/generic.py

+11-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
pandas_dtype)
2828
from pandas.core.dtypes.cast import maybe_promote, maybe_upcast_putmask
2929
from pandas.core.dtypes.missing import isna, notna
30-
from pandas.core.dtypes.generic import ABCSeries, ABCPanel
30+
from pandas.core.dtypes.generic import ABCSeries, ABCPanel, ABCDataFrame
3131

3232
from pandas.core.common import (_values_from_object,
3333
_maybe_box_datetimelike,
@@ -1907,6 +1907,10 @@ def _slice(self, slobj, axis=0, kind=None):
19071907
return result
19081908

19091909
def _set_item(self, key, value):
1910+
if isinstance(key, str) and callable(getattr(self, key, None)):
1911+
warnings.warn("Column name '{key}' collides with a built-in "
1912+
"method, which will cause unexpected attribute "
1913+
"behavior".format(key=key), stacklevel=3)
19101914
self._data.set(key, value)
19111915
self._clear_item_cache()
19121916

@@ -3357,6 +3361,12 @@ def __setattr__(self, name, value):
33573361
else:
33583362
object.__setattr__(self, name, value)
33593363
except (AttributeError, TypeError):
3364+
if isinstance(self, ABCDataFrame) and (is_list_like(value)):
3365+
warnings.warn("Pandas doesn't allow Series to be assigned "
3366+
"into nonexistent columns - see "
3367+
"https://pandas.pydata.org/pandas-docs/"
3368+
"stable/indexing.html#attribute-access",
3369+
stacklevel=2)
33603370
object.__setattr__(self, name, value)
33613371

33623372
# ----------------------------------------------------------------------

pandas/tests/dtypes/test_generic.py

+38
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import numpy as np
55
import pandas as pd
66
from pandas.core.dtypes import generic as gt
7+
from pandas.util import testing as tm
78

89

910
class TestABCClasses(object):
@@ -38,3 +39,40 @@ def test_abc_types(self):
3839
assert isinstance(self.sparse_array, gt.ABCSparseArray)
3940
assert isinstance(self.categorical, gt.ABCCategorical)
4041
assert isinstance(pd.Period('2012', freq='A-DEC'), gt.ABCPeriod)
42+
43+
44+
def test_setattr_warnings():
45+
# GH5904 - Suggestion: Warning for DataFrame colname-methodname clash
46+
# GH7175 - GOTCHA: You can't use dot notation to add a column...
47+
d = {'one': pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
48+
'two': pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
49+
df = pd.DataFrame(d)
50+
51+
with catch_warnings(record=True) as w:
52+
# successfully add new column
53+
# this should not raise a warning
54+
df['three'] = df.two + 1
55+
assert len(w) == 0
56+
assert df.three.sum() > df.two.sum()
57+
58+
with catch_warnings(record=True) as w:
59+
# successfully modify column in place
60+
# this should not raise a warning
61+
df.one += 1
62+
assert len(w) == 0
63+
assert df.one.iloc[0] == 2
64+
65+
with catch_warnings(record=True) as w:
66+
# successfully add an attribute to a series
67+
# this should not raise a warning
68+
df.two.not_an_index = [1, 2]
69+
assert len(w) == 0
70+
71+
with tm.assert_produces_warning(UserWarning):
72+
# warn when setting column to nonexistent name
73+
df.four = df.two + 2
74+
assert df.four.sum() > df.two.sum()
75+
76+
with tm.assert_produces_warning(UserWarning):
77+
# warn when column has same name as method
78+
df['sum'] = df.two

pandas/tests/io/test_pytables.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -2011,7 +2011,7 @@ def check(obj, comparator):
20112011
df['string'] = 'foo'
20122012
df['float322'] = 1.
20132013
df['float322'] = df['float322'].astype('float32')
2014-
df['bool'] = df['float322'] > 0
2014+
df['boolean'] = df['float322'] > 0
20152015
df['time1'] = Timestamp('20130101')
20162016
df['time2'] = Timestamp('20130102')
20172017
check(df, tm.assert_frame_equal)
@@ -2141,7 +2141,7 @@ def test_table_values_dtypes_roundtrip(self):
21412141
df1['string'] = 'foo'
21422142
df1['float322'] = 1.
21432143
df1['float322'] = df1['float322'].astype('float32')
2144-
df1['bool'] = df1['float32'] > 0
2144+
df1['boolean'] = df1['float32'] > 0
21452145
df1['time1'] = Timestamp('20130101')
21462146
df1['time2'] = Timestamp('20130102')
21472147

0 commit comments

Comments
 (0)