-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Enabling chained assignment checks (SettingWithCopyWarning) can have huge performance impact #18743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
of course, this has to run the garbage collector. You can certainly just disable them. This wont' be fixed in pandas 2. |
It would probably be helpful to document the performance impact more clearly. This can have subtle side effects, which are very hard to find. I only noticed it, because a Dask/Distributed computation was much slower than expected (use case documented on SO) |
and if u want to put up a PR would be happy to take it |
Has this been fixed in the meantime? Running the script from the original post, I see ../foo.py:15: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[str(c)] = np.random.uniform(size=df.shape[0])
Runtime: 0.749 sec
Runtime: 0.668 sec I'm using pandas master, Python 3.7 on MacOS. |
this should be ok now @TomAugspurger is that not what u r seeing? |
I'm seeing that it's fixed now, just wanted to clarify since we had some Dask users reporting issues (but they're likely on an older pandas). |
Similar to an observation on reddit I noticed that there is a huge performance difference between the default
pandas pd.options.mode.chained_assignment = 'warn'
over setting it toNone
.Code Sample
Problem description
The run times vary a lot depending on the whether the
SettingWithCopyWarning
is enabled or disable. I tried with a few different Pandas/Python versions:Ideally, there should not be such a big penalty for
SettingWithCopyWarning
.From profiling results it looks like the reason might be this call to
gc.collect
.Output of
pd.show_versions()
pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: