You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I run the 3 notebook cells above, I get the following pandas warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
However, if I only run cell 1 and 3, no warning is triggered. The only difference is cell 2 where we just print the DataFrame. To be more specific, running print(data) instead of just data also does not trigger the warning.
I do not believe this to be a jupyter notebook bug but it seems that by simply running a cell with data, the state of the DataFrame is in some way modified which is definitely not expected behavior. Probably this bug is due to some pandas styling functions that are triggered in the background when a cell is run on a DataFrame.
As a side note, the bug also does not appear when all the code is run in one single cell (probably because no DataFrame is printed) or when the line data = data.at_time("10:00") is omitted.
Expected Behavior
The triggering of the warning should not be dependent on whether you run cell 2 or not.
Installed Versions
INSTALLED VERSIONS
commit : 73c6825
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.11.0-37-generic
Version : #41~20.04.2-Ubuntu SMP Fri Sep 24 09:06:38 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
I don't believe this is a bug, but you've hit an interesting corner case.
In Jupyter (actually in all IPython instances including consoles), all results returned from a cell are stored as variables in RAM. This allows you to reference the result of cell 2 as _2 later on, for example. They are not garbage collected unless you explicitly delete them.
at_time creates a view on data from cell 1, and stores that view to the variable data. If cell 2 has not been run, the original data from cell 1 is immediately garbage collected since it has no references, and so the new view is the only version of data that exists. Thus the second line assigning 'data_col_2' is safe to run.
If cell 2 has been run, the data variable created in cell 1 still exists as _2. So you can think of cell 3 then as:
### Notebook Call 3 ###data=_2.at_time("10:00") # data is now a view of _2data["data_col_2"] =0# this line would potentially modify both data and _2, and so warns
I'm not sure there's any way around this because a user could always do something like:
data# cell 2
.... # lots of cellsx=_2data=data.at_time('10:00') # potentially conflicts with expectation of x
so even if there existed some way to detect if the original DataFrame were stored as a Jupyter result (which I don't think there is) it is always possible for the result to be stored later as some other strong reference.
Thanks for the explanation @Liam3851. Agreed if this is an artifact of iPython, then there is not really anything more to do from the pandas side. Closing as a usage question
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
If I run the 3 notebook cells above, I get the following pandas warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
However, if I only run cell 1 and 3, no warning is triggered. The only difference is cell 2 where we just print the DataFrame. To be more specific, running
print(data)
instead of justdata
also does not trigger the warning.I do not believe this to be a jupyter notebook bug but it seems that by simply running a cell with
data
, the state of the DataFrame is in some way modified which is definitely not expected behavior. Probably this bug is due to some pandas styling functions that are triggered in the background when a cell is run on a DataFrame.As a side note, the bug also does not appear when all the code is run in one single cell (probably because no DataFrame is printed) or when the line
data = data.at_time("10:00")
is omitted.Expected Behavior
The triggering of the warning should not be dependent on whether you run cell 2 or not.
Installed Versions
INSTALLED VERSIONS
commit : 73c6825
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.11.0-37-generic
Version : #41~20.04.2-Ubuntu SMP Fri Sep 24 09:06:38 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.3
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200714
Cython : 0.29.21
pytest : 5.4.3
hypothesis : None
sphinx : 3.1.2
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 2.0.0
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1
The text was updated successfully, but these errors were encountered: