Skip to content

.rolling().std() only returns NaN in Python3.7 #21786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tweakimp opened this issue Jul 7, 2018 · 19 comments · Fixed by #21813
Closed

.rolling().std() only returns NaN in Python3.7 #21786

tweakimp opened this issue Jul 7, 2018 · 19 comments · Fixed by #21813
Labels
Bug Regression Functionality that used to work in a prior pandas version Window rolling, ewma, expanding Windows Windows OS
Milestone

Comments

@tweakimp
Copy link

tweakimp commented Jul 7, 2018

import pandas as pd
d = {"col": [1, 23, 231, 231, 4, 353, 62, 3, 56, 43, 354, 43, 231, 21, 7]}
df = pd.DataFrame(data=d)
std = df["col"].std()
df["mean5"] = df["col"].rolling(5).mean()
df["std5"] = df["col"].rolling(5).std()

print(std)
print(df[["mean5", "std5"]])

# OUTPUT
130.20855066648528
    mean5  std5 
0     NaN   NaN
1     NaN   NaN
2     NaN   NaN
3     NaN   NaN
4    98.0   NaN
5   168.4   NaN
6   176.2   NaN
7   130.6   NaN
8    95.6   NaN
9   103.4   NaN
10  103.6   NaN
11   99.8   NaN
12  145.4   NaN
13  138.4   NaN
14  131.2   NaN

Problem description

.std() and .rolling().mean() work as intended, but .rolling().std() only returns NaN
I just upgraded from Python 3.6.5 where the same code did work perfectly.
I am now on Python 3.7, pandas 0.23.2

Expected Output

130.20855066648528
    mean5        std5
0     NaN         NaN
1     NaN         NaN
2     NaN         NaN
3     NaN         NaN
4    98.0  108.855868
5   168.4  134.226078
6   176.2  126.458531
7   130.6  138.965607
8    95.6  131.085621
9   103.4  126.482568
10  103.6  126.877264
11   99.8  128.342355
12  145.4  126.337010
13  138.4  131.941805
14  131.2  137.803338

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.2
pytest: None
pip: 10.0.1
setuptools: 39.0.1
Cython: None
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
None

@TomAugspurger
Copy link
Contributor

Hmm I can't reproduce with Python 3.7 and pandas 0.23.2

In [1]: paste
import pandas as pd
d = {"col": [1, 23, 231, 231, 4, 353, 62, 3, 56, 43, 354, 43, 231, 21, 7]}
df = pd.DataFrame(data=d)
std = df["col"].std()
df["mean5"] = df["col"].rolling(5).mean()
df["std5"] = df["col"].rolling(5).std()

print(std)
print(df[["mean5", "std5"]])

## -- End pasted text --
130.20855066648528
    mean5        std5
0     NaN         NaN
1     NaN         NaN
2     NaN         NaN
3     NaN         NaN
4    98.0  121.704560
5   168.4  150.069317
6   176.2  141.384936
7   130.6  155.368272
8    95.6  146.558180
9   103.4  141.411810
10  103.6  141.853093
11   99.8  143.491115
12  145.4  141.249071
13  138.4  147.515423
14  131.2  154.068816

In [2]: pd.__version__
Out[2]: '0.23.2'

@tweakimp
Copy link
Author

tweakimp commented Jul 7, 2018

Weird, what do we do now? Could you share your pd.show_versions() so we can compare the dependencies?

@TomAugspurger
Copy link
Contributor

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 17.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.2
pytest: None
pip: 10.0.1
setuptools: 39.2.0
Cython: None
numpy: 1.14.5
scipy: None
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Perhaps someone else will be able to reproduce.

A couple questions

  1. How'd you install pandas? Conda or pip?
  2. Can you try with 0.23.1?

@tweakimp
Copy link
Author

tweakimp commented Jul 7, 2018

I use pip. When I try to install 0.23.1 I get the error : Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools although I have it installed :/

@TomAugspurger
Copy link
Contributor

Probably something with your path...

@chris-b1 any chance this is related to the C++ build stuff? Can you try on 3.7 with the wheels on PyPI?

@tweakimp
Copy link
Author

tweakimp commented Jul 7, 2018

Can you link me to a pandas version 0.23.1 for python 3.7? Here I only find versions up to 3.6.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 7, 2018 via email

@tweakimp
Copy link
Author

tweakimp commented Jul 7, 2018

I can not try with 0.23.1 then because the error only happens with python 3.7 for me

@chris-b1
Copy link
Contributor

chris-b1 commented Jul 8, 2018

I reproduce on Windows 10 with 3.7 and the 0.23.1 wheel on PyPI, or building from source of PyPI (--no-binary pandas) or off of tagged release from cython.

At first review I'm stumped - the rolling var calc seems to only use c-math / pointers, nothing from libc[++].

with nogil:

@chris-b1 chris-b1 added Bug Regression Functionality that used to work in a prior pandas version Windows Windows OS labels Jul 8, 2018
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 8, 2018 via email

@tweakimp
Copy link
Author

tweakimp commented Jul 8, 2018

var and corr with the same problem:

import pandas as pd

d = {"col": [1, 23, 231, 231, 4, 353, 62, 3, 56, 43, 354, 43, 231, 21, 7]}
df = pd.DataFrame(data=d)
f = lambda x: sum(x, 1)
funcs = [
    "count()",
    "sum()",
    "mean()",
    "median()",
    "var()",
    "std()",
    "min()",
    "max()",
    "corr(df['col'])",
    "cov()",
    "skew()",
    "kurt()",
    "apply(f)",
    "agg({'col':'sum'})",
    "agg({'col':'std'})",
    "quantile(0.3)",
    "ndim",
    "is_datetimelike",
    "is_freq_type",
]
for func in funcs:
    df[func] = eval(f"df['col'].rolling(5).{func}")
print(df)

#OUTPUT
    col  count()  sum()  mean()  median()  var()  std()  min()  max()  corr(df['col'])    cov()      skew()      kurt()  apply(f)  agg({'col':'sum'})  agg({'col':'std'})  quantile(0.3)  ndim  is_datetimelike  is_freq_type
0     1      1.0    NaN     NaN       NaN    NaN    NaN    NaN    NaN              NaN      NaN         NaN         NaN       NaN                 NaN                 NaN            NaN     1            False         False
1    23      2.0    NaN     NaN       NaN    NaN    NaN    NaN    NaN              NaN      NaN         NaN         NaN       NaN                 NaN                 NaN            NaN     1            False         False
2   231      3.0    NaN     NaN       NaN    NaN    NaN    NaN    NaN              NaN      NaN         NaN         NaN       NaN                 NaN                 NaN            NaN     1            False         False
3   231      4.0    NaN     NaN       NaN    NaN    NaN    NaN    NaN              NaN      NaN         NaN         NaN       NaN                 NaN                 NaN            NaN     1            False         False
4     4      5.0  490.0    98.0      23.0    NaN    NaN    1.0  231.0              NaN  14812.0  0.58711909 -3.30501520     491.0               490.0                 NaN            7.8     1            False         False
5   353      5.0  842.0   168.4     231.0    NaN    NaN    4.0  353.0              NaN  22520.8 -0.09073220 -2.16044852     843.0               842.0                 NaN           64.6     1            False         False
6    62      5.0  881.0   176.2     231.0    NaN    NaN    4.0  353.0              NaN  19989.7 -0.10909425 -1.60438496     882.0               881.0                 NaN           95.8     1            False         False
7     3      5.0  653.0   130.6      62.0    NaN    NaN    3.0  353.0              NaN  24139.3  0.84243331 -1.36672178     654.0               653.0                 NaN           15.6     1            False         False
8    56      5.0  478.0    95.6      56.0    NaN    NaN    3.0  353.0              NaN  21479.3  2.03720693  4.29341424     479.0               478.0                 NaN           14.4     1            False         False
9    43      5.0  517.0   103.4      56.0    NaN    NaN    3.0  353.0              NaN  19997.3  2.08348019  4.51654869     518.0               517.0                 NaN           45.6     1            False         False
10  354      5.0  518.0   103.6      56.0    NaN    NaN    3.0  354.0              NaN  20122.3  2.08443877  4.51937764     519.0               518.0                 NaN           45.6     1            False         False
11   43      5.0  499.0    99.8      43.0    NaN    NaN    3.0  354.0              NaN  20589.7  2.12508425  4.64265427     500.0               499.0                 NaN           43.0     1            False         False
12  231      5.0  727.0   145.4      56.0    NaN    NaN   43.0  354.0              NaN  19951.3  1.01164898 -0.99425147     728.0               727.0                 NaN           45.6     1            False         False
13   21      5.0  692.0   138.4      43.0    NaN    NaN   21.0  354.0              NaN  21760.8  0.96847264 -1.16346708     693.0               692.0                 NaN           43.0     1            False         False
14    7      5.0  656.0   131.2      43.0    NaN    NaN    7.0  354.0              NaN  23737.2  0.92438493 -1.32408409     657.0               656.0                 NaN           25.4     1            False         False
std = df["col"].std()
var = df["col"].var()
corr = df["col"].corr(df["col"])
print(std, var, corr)

#OUTPUT (work as intended without the rolling window)
130.20855066648528 16954.266666666666 1.0

@chris-b1
Copy link
Contributor

chris-b1 commented Jul 8, 2018

Thanks @tweakimp - those both also call the same cython routine (roll_var), so the same problem.

I've got it partially figured out - seems this branch is being (incorrectly) optimized out on the latest MSVC, works again if I throw a printf in there. Still trying to riddle out if it's a compiler bug, or some flag set wrong.

if val == val:

@chris-b1
Copy link
Contributor

chris-b1 commented Jul 8, 2018

hmm, turning off the /GL flag does fix this locally, but may be some broader performance implications of that and still not sure it should be necessary

(pandas-dev37) λ git diff setup.py
diff --git a/setup.py b/setup.py
index 8018d71b7..5535dd23f 100755
--- a/setup.py
+++ b/setup.py
@@ -455,7 +455,8 @@ def pxd(name):


 if is_platform_windows():
-    extra_compile_args = []
+    extra_compile_args = ['/GL-']

@tweakimp
Copy link
Author

tweakimp commented Jul 8, 2018

Workaround:

.apply(lambda x: pd.np.std(x))
import pandas as pd

d = {"col": [1, 23, 231, 231, 4, 353, 62, 3, 56, 43, 354, 43, 231, 21, 7]}
df = pd.DataFrame(data=d)
df["std5"] = df["col"].rolling(5).apply(lambda x: pd.np.std(x))
print(df["std5"])
# OUTPUT (as expected)
0            NaN
1            NaN
2            NaN
3            NaN
4     108.855868
5     134.226078
6     126.458531
7     138.965607
8     131.085621
9     126.482568
10    126.877264
11    128.342355
12    126.337010
13    131.941805
14    137.803338

@chris-b1
Copy link
Contributor

chris-b1 commented Jul 8, 2018

Yep, do note that np.std defaults to a ddof of 0, while pandas uses 1.

@jreback jreback added this to the 0.23.4 milestone Jul 25, 2018
@psicktrick
Copy link

can anyone suggest a similar workaround for rolling correlation between two columns of a dataframe?

@psicktrick
Copy link

Will downgrading to msvc 2015 help my case?

@chris-b1
Copy link
Contributor

chris-b1 commented Aug 1, 2018

@psicktrick - if you are building from source, yes using MSVC 2015 will solve this issue, or you can also build from master, which has a fix applied. As far as I know, not an easy workaround for corr, but should be a release of 0.23.4 soon (#22128)

@WillAyd WillAyd added the Window rolling, ewma, expanding label Sep 4, 2018
@sherwany
Copy link

sherwany commented Aug 7, 2020

Hi,
Wondering if someone can please help me.
I am getting the same issue - using pandas 1.0.3 as well as 0.23.2 - I have tried with Python 3.7 and 3.6.5.
The workaround mentioned above i.e. .apply(lambda x: pd.np.std(x)) doesn't fix the issue for me.

SampleData

Is there any other workaround I could use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version Window rolling, ewma, expanding Windows Windows OS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants