@@ -62,6 +62,21 @@ among the series in the DataFrame, also excluding NA/null values.
62
62
frame = DataFrame(randn(1000 , 5 ), columns = [' a' , ' b' , ' c' , ' d' , ' e' ])
63
63
frame.cov()
64
64
65
+ ``DataFrame.cov `` also supports an optional ``min_periods `` keyword that
66
+ specifies the required minimum number of observations for each column pair
67
+ in order to have a valid result.
68
+
69
+ .. ipython :: python
70
+
71
+ frame = DataFrame(randn(20 , 3 ), columns = [' a' , ' b' , ' c' ])
72
+ frame.ix[:5 , ' a' ] = np.nan
73
+ frame.ix[5 :10 , ' b' ] = np.nan
74
+
75
+ frame.cov()
76
+
77
+ frame.cov(min_periods = 12 )
78
+
79
+
65
80
.. _computation.correlation :
66
81
67
82
Correlation
@@ -97,6 +112,19 @@ All of these are currently computed using pairwise complete observations.
97
112
Note that non-numeric columns will be automatically excluded from the
98
113
correlation calculation.
99
114
115
+ Like ``cov ``, ``corr `` also supports the optional ``min_periods `` keyword:
116
+
117
+ .. ipython :: python
118
+
119
+ frame = DataFrame(randn(20 , 3 ), columns = [' a' , ' b' , ' c' ])
120
+ frame.ix[:5 , ' a' ] = np.nan
121
+ frame.ix[5 :10 , ' b' ] = np.nan
122
+
123
+ frame.corr()
124
+
125
+ frame.corr(min_periods = 12 )
126
+
127
+
100
128
A related method ``corrwith `` is implemented on DataFrame to compute the
101
129
correlation between like-labeled Series contained in different DataFrame
102
130
objects.
@@ -290,9 +318,9 @@ columns using ``ix`` indexing:
290
318
291
319
Expanding window moment functions
292
320
---------------------------------
293
- A common alternative to rolling statistics is to use an *expanding * window,
294
- which yields the value of the statistic with all the data available up to that
295
- point in time. As these calculations are a special case of rolling statistics,
321
+ A common alternative to rolling statistics is to use an *expanding * window,
322
+ which yields the value of the statistic with all the data available up to that
323
+ point in time. As these calculations are a special case of rolling statistics,
296
324
they are implemented in pandas such that the following two calls are equivalent:
297
325
298
326
.. ipython :: python
@@ -301,7 +329,7 @@ they are implemented in pandas such that the following two calls are equivalent:
301
329
302
330
expanding_mean(df)[:5 ]
303
331
304
- Like the ``rolling_ `` functions, the following methods are included in the
332
+ Like the ``rolling_ `` functions, the following methods are included in the
305
333
``pandas `` namespace or can be located in ``pandas.stats.moments ``.
306
334
307
335
.. csv-table ::
@@ -324,12 +352,12 @@ Like the ``rolling_`` functions, the following methods are included in the
324
352
``expanding_corr ``, Correlation (binary)
325
353
``expanding_corr_pairwise ``, Pairwise correlation of DataFrame columns
326
354
327
- Aside from not having a ``window `` parameter, these functions have the same
328
- interfaces as their ``rolling_ `` counterpart. Like above, the parameters they
355
+ Aside from not having a ``window `` parameter, these functions have the same
356
+ interfaces as their ``rolling_ `` counterpart. Like above, the parameters they
329
357
all accept are:
330
358
331
- - ``min_periods ``: threshold of non-null data points to require. Defaults to
332
- minimum needed to compute statistic. No ``NaNs `` will be output once
359
+ - ``min_periods ``: threshold of non-null data points to require. Defaults to
360
+ minimum needed to compute statistic. No ``NaNs `` will be output once
333
361
``min_periods `` non-null data points have been seen.
334
362
- ``freq ``: optionally specify a :ref: `frequency string <timeseries.alias >`
335
363
or :ref: `DateOffset <timeseries.offsets >` to pre-conform the data to.
@@ -338,15 +366,15 @@ all accept are:
338
366
339
367
.. note ::
340
368
341
- The output of the ``rolling_ `` and ``expanding_ `` functions do not return a
342
- ``NaN `` if there are at least ``min_periods `` non-null values in the current
343
- window. This differs from ``cumsum ``, ``cumprod ``, ``cummax ``, and
344
- ``cummin ``, which return ``NaN `` in the output wherever a ``NaN `` is
369
+ The output of the ``rolling_ `` and ``expanding_ `` functions do not return a
370
+ ``NaN `` if there are at least ``min_periods `` non-null values in the current
371
+ window. This differs from ``cumsum ``, ``cumprod ``, ``cummax ``, and
372
+ ``cummin ``, which return ``NaN `` in the output wherever a ``NaN `` is
345
373
encountered in the input.
346
374
347
- An expanding window statistic will be more stable (and less responsive) than
348
- its rolling window counterpart as the increasing window size decreases the
349
- relative impact of an individual data point. As an example, here is the
375
+ An expanding window statistic will be more stable (and less responsive) than
376
+ its rolling window counterpart as the increasing window size decreases the
377
+ relative impact of an individual data point. As an example, here is the
350
378
``expanding_mean `` output for the previous time series dataset:
351
379
352
380
.. ipython :: python
0 commit comments