BUG: Unexpected behaviour of rolling with apply on DataFrame #34965
Labels
Bug
Duplicate Report
Duplicate issue or pull request
Needs Triage
Issue that has not been reviewed by a pandas team member
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
When executed on a DataFrame, rolling seems to select only certain columns for processing. For demonstration, I created a DataFrame that has three columns (A, B, and C), of which the first contains TimeDeltas and the other contain Floats. When using rolling, e.g. with sum, only the Floats are passed on.
Even stranger, when used in combination with apply, only the first column containing Floats is passed to the function, whereas I would have expected the corresponding part of the DataFrame.
Code Sample, a copy-pastable example
The resulting df will look like this:
Applying rolling with sum like this:
will result in the following output, in which the first column is missing:
To demonstrate the problem with apply, I created a custom function that simply outputs the number of columns (since I expected a DataFrame to be passed to the function:
This produces the exception "AttributeError: 'Series' object has no attribute 'columns'" and the following printout:
Problem description
I would expect in both cases that the windowed DataFrame with all columns is used within the function (either sum or get_num_columns).
Expected Output
In the case of sum, I would either expect an Exception that tells the user that only Floats are acceptable or - preferably - the following output:
In the case of apply, I would have expected a DataFrame as input to the function. Therefore, the output of the function (without the prints) should be:
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.76-linuxkit
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.3.1.post20200616
Cython : 0.29.20
pytest : 5.4.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.1
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.15.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.1
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pytest : 5.4.3
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.17
tables : 3.6.1
tabulate : None
xarray : 0.15.1
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.48.0
The text was updated successfully, but these errors were encountered: