-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrame.apply(raw=True) applies function twice on first row/column #34506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The reason why you see the side effect twice is because in Line 221 in 48c00ea
mul2 gets called twice, first inside the try statement with pandas._libs 's compute_reduction , and then after an error is thrown, in the except block.
I haven't understood the error in pandas/pandas/_libs/reduction.pyx Line 629 in 48c00ea
apply_raw is that it's thrown intentionally, and apply_raw is working as expected.
I'm happy to work on a pull request/documentation update if we think there are changes or improvements to be made! |
@alonme Cool! I'll work on a pull request and loop you in! |
Per #34913, and comments on that issue, i think we can close this issue. |
on master
|
Hey @simonjayhawkins, I already added a test for this issue with an XFAIL, it seems that the XFAIL was removed when this was fixed. Do you think more tests are needed? |
There seemed to be reluctance to close this issue in #34913 without a test. #34913 (comment) @jbrockmendel ? |
I'm fine with closing this |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Problem description
Because of an implementation detail, the applied function is applied twice on the first row/column,
This results in side effects happening twice.
Other than the
apply_raw
code path, this issues should be fixed by #34183.Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 07edc87
python : 3.7.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.5.0
Version : Darwin Kernel Version 19.5.0: Thu Apr 30 18:25:59 PDT 2020; root:xnu-6153.121.1~7/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.0.dev0+1706.g07edc871f.dirty
numpy : 1.18.3
pytz : 2020.1
dateutil : 2.8.1
pip : 19.2.3
setuptools : 41.2.0
Cython : 0.29.17
pytest : 5.4.1
hypothesis : 5.11.0
sphinx : 3.0.3
blosc : 1.9.1
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.14.0
pandas_datareader: None
bs4 : 4.9.0
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.17.0
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.16
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.49.0
The text was updated successfully, but these errors were encountered: