Skip to content

Rolling max and min on datetime column incorrect when NaN in window #22931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Oct 1, 2018 · 8 comments
Closed

Rolling max and min on datetime column incorrect when NaN in window #22931

ghost opened this issue Oct 1, 2018 · 8 comments
Labels
Window rolling, ewma, expanding

Comments

@ghost
Copy link

ghost commented Oct 1, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

df = pd.DataFrame({'B': [0, 1, np.nan, 3, 4], 'C': [4, 3, np.nan, 1, 0],
	'Time': [pd.Timestamp('20130101 09:00:00'), 
	pd.Timestamp('20130101 09:00:01'), 
	pd.Timestamp('20130101 09:00:02'), 
	pd.Timestamp('20130101 09:00:03'), 
	pd.Timestamp('20130101 09:00:04')]})

df.rolling('4s', on='Time').max()
df.rolling('4s', on='Time').min()

Problem description

When running a rolling max or min window on a datetime column, a NaN value seems to prevent the max or min function from considering values that follow it, even if those values are within the window.

dataframe
     B                Time    C
0  0.0 2013-01-01 09:00:00  4.0
1  1.0 2013-01-01 09:00:01  3.0
2  NaN 2013-01-01 09:00:02  NaN
3  3.0 2013-01-01 09:00:03  1.0
4  4.0 2013-01-01 09:00:04  0.0

max() Output (column B)

In[1]: df.rolling('4s', on='Time').max()
Out[1]:
     B                Time    C
0  0.0 2013-01-01 09:00:00  4.0
1  1.0 2013-01-01 09:00:01  4.0
2  1.0 2013-01-01 09:00:02  4.0
3  1.0 2013-01-01 09:00:03  4.0
4  1.0 2013-01-01 09:00:04  3.0

Expected max() Output (column B)

In[1]: df.rolling('4s', on='Time').max()
Out[1]:
     B                Time    C
0  0.0 2013-01-01 09:00:00  4.0
1  1.0 2013-01-01 09:00:01  4.0
2  1.0 2013-01-01 09:00:02  4.0
3  3.0 2013-01-01 09:00:03  4.0
4  4.0 2013-01-01 09:00:04  3.0

min() Output (column C)

In[2]: df.rolling('4s', on='Time').min()
Out[2]:
     B                Time    C
0  0.0 2013-01-01 09:00:00  4.0
1  0.0 2013-01-01 09:00:01  3.0
2  0.0 2013-01-01 09:00:02  3.0
3  0.0 2013-01-01 09:00:03  3.0
4  1.0 2013-01-01 09:00:04  3.0

Expected min() Output (column C)

In[2]: df.rolling('4s', on='Time').min()
Out[2]:
     B                Time    C
0  0.0 2013-01-01 09:00:00  4.0
1  0.0 2013-01-01 09:00:01  3.0
2  0.0 2013-01-01 09:00:02  3.0
3  0.0 2013-01-01 09:00:03  1.0
4  1.0 2013-01-01 09:00:04  0.0

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 40.4.3
Cython: None
numpy: 1.15.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd
Copy link
Member

WillAyd commented Oct 2, 2018

Hmm I tried both Python2 and Python3 on macOS and was not able to reproduce this - @chris-b1 any chance this is a Windows-specific issue?

@WillAyd WillAyd added Can't Repro Windows Windows OS Window rolling, ewma, expanding labels Oct 2, 2018
@mroeschke
Copy link
Member

This may have been solved by #21853?

@ghost
Copy link
Author

ghost commented Oct 2, 2018

Hmm I tried both Python2 and Python3 on macOS and was not able to reproduce this - @chris-b1 any chance this is a Windows-specific issue?

In response to this I created a fresh virtualenv with just the latest version of pandas and it's requirements and I was still having this issue. I'll try on a mint to see if I can replicate the issue or identify it as windows-only.

@TomAugspurger
Copy link
Contributor

just the latest version of pandas and it's requirements

What do you mean by latest version? Master or 0.23.4? That fix is in 0.24, which isn't released yet.

@ghost
Copy link
Author

ghost commented Oct 2, 2018

What do you mean by latest version? Master or 0.23.4?

Should've mentioned I updated the Output of pd.show_versions() in the original post. I mean version 0.23.4.

That fix is in 0.24, which isn't released yet.

If you're referencing #21853 I'm unable to verify if this fixes the issue.

@ghost
Copy link
Author

ghost commented Oct 2, 2018

I've replicated this issue on lubuntu. Here's the file and show_versions. It has the same output as OP:

rolling_test.py

import pandas as pd
import numpy as np

df = pd.DataFrame({'B': [0, 1, np.nan, 3, 4], 'C': [4, 3, np.nan, 1, 0],
        'Time': [pd.Timestamp('20130101 09:00:00'),
        pd.Timestamp('20130101 09:00:01'),
        pd.Timestamp('20130101 09:00:02'),
        pd.Timestamp('20130101 09:00:03'),
        pd.Timestamp('20130101 09:00:04')]})
print('df:')
print(df)
print("max:")
print(df.rolling('4s', on='Time').max())
print('min:')
print(df.rolling('4s', on='Time').min())

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.15.candidate.1
python-bits: 64
OS: Linux
OS-release: 4.15.0-20-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 40.4.3
Cython: None
numpy: 1.15.2
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

<\details>

@WillAyd WillAyd removed the Windows Windows OS label Oct 2, 2018
@WillAyd
Copy link
Member

WillAyd commented Oct 2, 2018

Can you try on master? Comments above suggest this may have already been solved so would be good to confirm whether or not that is the case

@ghost
Copy link
Author

ghost commented Oct 3, 2018

Issue is resolved on master (pandas: 0.24.0.dev0+671.g08ecba8da)

@ghost ghost closed this as completed Oct 3, 2018
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

3 participants