Skip to content

BUG: rolling.corr() produces wrong result with equal values #18430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
byospe opened this issue Nov 22, 2017 · 3 comments · Fixed by #18481
Closed

BUG: rolling.corr() produces wrong result with equal values #18430

byospe opened this issue Nov 22, 2017 · 3 comments · Fixed by #18481
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@byospe
Copy link

byospe commented Nov 22, 2017

Code Sample, a copy-pastable example if possible

s = pd.Series([1,1,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,7,0,0,0])
pd.rolling_corr(s,s,6)

Problem description

rolling_corr is producing the wrong result:

python
pd.rolling_corr(s,s,6)

0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 1.0
6 1.0
7 1.0
8 1.0
9 0.0
10 0.0
11 0.0
12 0.0
13 0.0
14 0.0
15 0.0
16 0.0
17 0.0
18 0.0
19 0.0
20 0.0
21 0.0
22 0.0
23 0.0
24 0.0
25 0.0
26 1.0
27 1.0
28 1.0
29 1.0
30 1.0
31 1.0
32 1.0
33 1.0

This should have nan's instead of 0's for windows with static data.

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.35-pv-ts2
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.0.7
pip: 9.0.1
setuptools: 36.4.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.6.2
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 1.5.1
openpyxl: 2.4.8
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.8
lxml: None
bs4: 4.5.3
html5lib: 0.999
sqlalchemy: 1.1.10
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Nov 22, 2017

I think this is a very similar problem to #18044, recently fixed in #18085. We are NaNing out values that are numerically very close to zero (e.g. denominator is std * std), but in this case we are missing it because they are not identically zero.

@jreback
Copy link
Contributor

jreback commented Nov 22, 2017

@byospe want to try a PR to fix?

@jreback jreback added Bug Difficulty Intermediate Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations labels Nov 22, 2017
@jreback jreback added this to the Next Major Release milestone Nov 22, 2017
@jreback jreback changed the title pd.rolling_corr produces wrong result BUG:rolling.corr() produces wrong result with equal values Nov 22, 2017
@jreback jreback changed the title BUG:rolling.corr() produces wrong result with equal values BUG: rolling.corr() produces wrong result with equal values Nov 22, 2017
@Licht-T
Copy link
Contributor

Licht-T commented Nov 25, 2017

I am working on this and the fixing is almost done.
Seems that the numerical calculation matter.
https://github.com/pandas-dev/pandas/blob/master/pandas/core/window.py#L1064

@jreback jreback modified the milestones: Next Major Release, 0.21.1 Nov 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants