BUG: Explicitly declaring columns when using loc does not produce a SettingWithCopyWarning #42703

HenryEcker · 2021-07-24T22:30:45Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.

Code Sample, a copy-pastable example

import pandas as pd

# Create mock dataframe
df = pd.DataFrame({
    'a': [1, 2, 3, 4]
})
m = df['a'].eq(2)

new_df1 = df[m]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df1._is_copy)
# SettingWithCopyWarning (expected)
new_df1['b'] = 1

new_df2 = df.loc[m]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df2._is_copy)
# SettingWithCopyWarning (unexpected)
new_df2['c'] = 1

new_df3 = df.loc[m, :]
# <weakref at 0x00000213E01C5090; to 'DataFrame' at 0x00000213DE9C6FD0>
print(new_df3._is_copy)
# SettingWithCopyWarning (unexpected)
new_df3['d'] = 1

new_df4 = df.loc[m, df.columns]
# None
print(new_df4._is_copy)
# No SettingWithCopyWarning (expected?)
new_df4['e'] = 1

Problem description

I expect that df[m] would produce a weakref, but I don't understand why I'm getting a weakref for the loc options where columns are not explicitly defined.

The issue being that:

new_df2 = df.loc[m]
new_df2['b'] = 1

and

new_df3 = df.loc[m, :]
new_df3['c'] = 1

Both warn:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)

I couldn't find any documentation saying that columns must be explicitly defined when using loc to not produce a copy, although all of the examples in the docs on Why does assignment fail when using chained indexing? do explicitly declare the column or columns.

Expected Output

I would expect not to get a SettingWithCopyWarning especially since : works when slicing the Index:

filtered_df = df.loc[:, ['a']]
print(filtered_df._is_copy)  # None
filtered_df['b'] = 1  # No Warning

Output of `pd.show_versions()`

Confirmed on Windows

INSTALLED VERSIONS

commit : f00ed8f
python : 3.9.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.3.0
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.1
pip : None
setuptools : None
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.22.0
pandas_datareader: 0.9.0
bs4 : 4.9.3
bottleneck : None
fsspec : 2021.05.0
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : None
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
numba : None

Confirmed On Mac

INSTALLED VERSIONS

commit : f00ed8f
python : 3.9.2.final.0
python-bits : 64
OS : Darwin
OS-release : 20.5.0
Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:31 PDT 2021; root:xnu-7195.121.3~9/RELEASE_ARM64_T8101
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8

pandas : 1.3.0
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 54.2.0
Cython : 0.29.22
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

phofl · 2021-07-24T23:40:31Z

Hey, thanks for your report.

Could you explain what you are trying to do? I don't understand the intend of these code snippets. Or are you only concerned with the warning shown?

HenryEcker · 2021-07-25T00:01:07Z

I guess I'm just looking to understand the inconsistency in behaviour.

Why does creating a new DataFrame with df.loc[m] and df.loc[m, :] cause a SettingWithCopyWarning when creating a new DataFrame from df.loc[m, df.columns] does not?

I couldn't find anywhere in the documentation that the columns must be explicitly declared when using loc to avoid the SettingWithCopyWarning. I also found the behaviour inconsistent because creating a new DataFrame with df.loc[:, ['a']] does not produce a SettingWithCopyWarning.

phofl · 2021-07-25T00:11:33Z

All of your statements are some kind of chained assignments. Your locs always return copies. I am more surprised, that the last one does not show the warning.

HenryEcker · 2021-07-25T00:15:39Z

I wasn't sure. I just found the behaviour inconsistent, and I wasn't clear which behaviour was supposed to be supported, but I felt that them being different was not the intention.

phofl · 2021-07-25T00:18:57Z

Yeah I would agree. The warning basically tries to tell you that something like df["x"]["y"] does not work as intended. You have the same cases here, which won't work, with the difference that you are using loc instead of a regular getitem as the first op. They all have in common that the copy will avoid the modification of df.

df.loc[m, df.columns]['a'] = 100

will do nothing, hence I think a warning should be shown

phofl · 2021-07-25T00:22:01Z

This is actually a duplicate of #18752

HenryEcker · 2021-07-25T00:22:50Z

Okay then the part that is really confusing is:

import pandas as pd

# Create mock dataframe
df = pd.DataFrame({
    'a': [1, 2, 3, 4]
})
m = df['a'].eq(2)

new_df1 = df[m]
new_df1['b'] = 1
print('df1')
print(new_df1)

new_df2 = df.loc[m]
new_df2['c'] = 1
print('df2')
print(new_df2)

new_df3 = df.loc[m, :]
new_df3['d'] = 1
print('df3')
print(new_df3)

new_df4 = df.loc[m, df.columns]
new_df4['e'] = 1
print('df4')
print(new_df4)

print('df')
print(df)

Gives output:

With 3 Warnings:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._set_item(key, value)

phofl · 2021-07-25T00:23:54Z

Yeah, we should have 4 not 3

You have to print df to see if the original df was modified

phofl · 2021-07-25T00:27:29Z

Yes because loc returns copies here, but we don‘t know if the copy was on purpose hence the warning

what is confusing for you?

phofl · 2021-07-25T00:29:58Z

Ok thx, yes I agree that we should have 4. But if you Look at the linked issue you will see that there are quite a few cases which are Not covered

HenryEcker · 2021-07-25T00:31:03Z

Oh. I understand now. loc returns a copy. The warning is to ensure you weren't trying to update the original DataFrame. The bug is that explicitly selecting columns for some reason does not produce the warning.

phofl · 2021-07-25T00:32:36Z

Closing the sincd this is covered by the other issue

HenryEcker added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 24, 2021

phofl added Copy / view semantics Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 25, 2021

phofl added Duplicate Report Duplicate issue or pull request Closing Candidate May be closeable, needs more eyeballs and removed Bug Copy / view semantics Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 25, 2021

phofl closed this as completed Jul 25, 2021

HenryEcker changed the title ~~BUG: Using loc without explicitly defining columns produces a SettingWithCopyWarning~~ BUG: Using explicitly defining columns when using loc does not produce a SettingWithCopyWarning Jul 25, 2021

HenryEcker changed the title ~~BUG: Using explicitly defining columns when using loc does not produce a SettingWithCopyWarning~~ BUG: Explicitly defining columns when using loc does not produce a SettingWithCopyWarning Jul 25, 2021

HenryEcker changed the title ~~BUG: Explicitly defining columns when using loc does not produce a SettingWithCopyWarning~~ BUG: Explicitly declaring columns when using loc does not produce a SettingWithCopyWarning Jul 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Explicitly declaring columns when using loc does not produce a SettingWithCopyWarning #42703

BUG: Explicitly declaring columns when using loc does not produce a SettingWithCopyWarning #42703

HenryEcker commented Jul 24, 2021 •

edited

Loading

INSTALLED VERSIONS

INSTALLED VERSIONS

phofl commented Jul 24, 2021

HenryEcker commented Jul 25, 2021 •

edited

Loading

phofl commented Jul 25, 2021

HenryEcker commented Jul 25, 2021

phofl commented Jul 25, 2021 •

edited

Loading

phofl commented Jul 25, 2021

HenryEcker commented Jul 25, 2021 •

edited

Loading

phofl commented Jul 25, 2021

phofl commented Jul 25, 2021 •

edited

Loading

phofl commented Jul 25, 2021

HenryEcker commented Jul 25, 2021

phofl commented Jul 25, 2021

BUG: Explicitly declaring columns when using loc does not produce a SettingWithCopyWarning #42703

BUG: Explicitly declaring columns when using loc does not produce a SettingWithCopyWarning #42703

Comments

HenryEcker commented Jul 24, 2021 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

INSTALLED VERSIONS

phofl commented Jul 24, 2021

HenryEcker commented Jul 25, 2021 • edited Loading

phofl commented Jul 25, 2021

HenryEcker commented Jul 25, 2021

phofl commented Jul 25, 2021 • edited Loading

phofl commented Jul 25, 2021

HenryEcker commented Jul 25, 2021 • edited Loading

phofl commented Jul 25, 2021

phofl commented Jul 25, 2021 • edited Loading

phofl commented Jul 25, 2021

HenryEcker commented Jul 25, 2021

phofl commented Jul 25, 2021

HenryEcker commented Jul 24, 2021 •

edited

Loading

Output of `pd.show_versions()`

HenryEcker commented Jul 25, 2021 •

edited

Loading

phofl commented Jul 25, 2021 •

edited

Loading

HenryEcker commented Jul 25, 2021 •

edited

Loading

phofl commented Jul 25, 2021 •

edited

Loading