BUG: Unexpected behaviour of rolling with apply on DataFrame #34965

oXwvdrbbj8S4wo9k8lSN · 2020-06-24T08:16:19Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

When executed on a DataFrame, rolling seems to select only certain columns for processing. For demonstration, I created a DataFrame that has three columns (A, B, and C), of which the first contains TimeDeltas and the other contain Floats. When using rolling, e.g. with sum, only the Floats are passed on.
Even stranger, when used in combination with apply, only the first column containing Floats is passed to the function, whereas I would have expected the corresponding part of the DataFrame.

Code Sample, a copy-pastable example

import pandas as pd
columns = ["A", "B", "C"]
index = list(range(10))
data = [[10**10,2,3]]*len(index)
df = pd.DataFrame(columns = columns, index = index, data=data)
df["A"] = df["A"].apply(pd.to_timedelta)

The resulting df will look like this:

         A  B  C
0 00:00:10  2  3
1 00:00:10  2  3
2 00:00:10  2  3
3 00:00:10  2  3
4 00:00:10  2  3
5 00:00:10  2  3
6 00:00:10  2  3
7 00:00:10  2  3
8 00:00:10  2  3
9 00:00:10  2  3

Applying rolling with sum like this:

df.rolling(window=2).sum()

will result in the following output, in which the first column is missing:

     B    C
0  NaN  NaN
1  4.0  6.0
2  4.0  6.0
3  4.0  6.0
4  4.0  6.0
5  4.0  6.0
6  4.0  6.0
7  4.0  6.0
8  4.0  6.0
9  4.0  6.0

To demonstrate the problem with apply, I created a custom function that simply outputs the number of columns (since I expected a DataFrame to be passed to the function:

def get_num_columns(sub_df):
    print(sub_df)
    return len(sub_df.columns)
df.rolling(window=2).apply(get_num_columns, raw=False)

This produces the exception "AttributeError: 'Series' object has no attribute 'columns'" and the following printout:

0    2.0
1    2.0
dtype: float64

Problem description

I would expect in both cases that the windowed DataFrame with all columns is used within the function (either sum or get_num_columns).

Expected Output

In the case of sum, I would either expect an Exception that tells the user that only Floats are acceptable or - preferably - the following output:

         A    B    C
0      NaT  NaN  NaN
1 00:00:20  4.0  6.0
2 00:00:20  4.0  6.0
3 00:00:20  4.0  6.0
4 00:00:20  4.0  6.0
5 00:00:20  4.0  6.0
6 00:00:20  4.0  6.0
7 00:00:20  4.0  6.0
8 00:00:20  4.0  6.0
9 00:00:20  4.0  6.0

In the case of apply, I would have expected a DataFrame as input to the function. Therefore, the output of the function (without the prints) should be:

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.76-linuxkit
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.3.1.post20200616
Cython : 0.29.20
pytest : 5.4.3
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.1
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.15.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.1
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pytest : 5.4.3
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.17
tables : 3.6.1
tabulate : None
xarray : 0.15.1
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-09-04T14:26:06Z

I think that rolling.apply is defined to take 1d input (ndframe or series). We don't have a table-wide rolling.apply (#15095).

We have #23002 for non-numeric data in rolling. Which I think covers all your issues.

oXwvdrbbj8S4wo9k8lSN added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 24, 2020

TomAugspurger closed this as completed Sep 4, 2020

TomAugspurger added the Duplicate Report Duplicate issue or pull request label Sep 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Unexpected behaviour of rolling with apply on DataFrame #34965

BUG: Unexpected behaviour of rolling with apply on DataFrame #34965

oXwvdrbbj8S4wo9k8lSN commented Jun 24, 2020 •

edited

Loading

INSTALLED VERSIONS

TomAugspurger commented Sep 4, 2020

BUG: Unexpected behaviour of rolling with apply on DataFrame #34965

BUG: Unexpected behaviour of rolling with apply on DataFrame #34965

Comments

oXwvdrbbj8S4wo9k8lSN commented Jun 24, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Sep 4, 2020

oXwvdrbbj8S4wo9k8lSN commented Jun 24, 2020 •

edited

Loading

Output of `pd.show_versions()`