How to stop a rolling window at nan values and continue after it? #35596

ghost · 2020-08-07T07:40:36Z

Posted the same question on stackoverflow. A user there said I should open a issue here on the github page, since it is a bug.

I have the following dataframe:

     df = pd.DataFrame([[0, 1, 2, 4, np.nan, np.nan, np.nan],
                   [0, 1, 2 ,np.nan, np.nan, np.nan,np.nan],
                   [0, 2, 2 ,np.nan, 2, np.nan,1]])

With output:

       0  1  2    3    4   5   6
    0  0  1  2  4.0  NaN NaN NaN
    1  0  1  2  NaN  NaN NaN NaN
    2  0  2  2  NaN  2.0 NaN 1.0

with dtypes:
df.dtypes

    0      int64
    1      int64
    2      int64
    3    float64
    4    float64
    5    float64
    6    float64
    dtype: object

Then the underneath rolling summation is applied:

df.rolling(window = 7, min_periods =1, axis = 'columns').sum()

And the output is as follows:

         0    1    2    3    4    5    6
    0  0.0  1.0  3.0  4.0  4.0  4.0  4.0
    1  0.0  1.0  3.0  NaN  NaN  NaN  NaN
    2  0.0  2.0  4.0  NaN  2.0  2.0  3.0

I notice that the rolling window stops and starts again whenever the dtype of the next column is different.

I however have a dataframe whereby all columns are of the same object type.
df = df.astype('object')``

which has output:

         0    1    2    3    4    5    6
    0  0.0  1.0  3.0  7.0  7.0  7.0  7.0
    1  0.0  1.0  3.0  3.0  3.0  3.0  3.0
    2  0.0  2.0  4.0  4.0  6.0  6.0  7.0

My desired output however, stops and starts again after a nan value appears. This would look like:


         0    1    2    3    4    5    6
    0  0.0  1.0  3.0  7.0  NaN  NaN  NaN
    1  0.0  1.0  3.0  NaN  NaN  NaN  NaN
    2  0.0  2.0  4.0  NaN  2.0  NaN  3.0

I figured there must be a way that NaN values are not considered but also not filled in with values obtained from the rolling window.

Anything would help!

The text was updated successfully, but these errors were encountered:

bmpalatiello · 2020-08-17T20:40:06Z

I'm trying to understand your desired output. You say the rolling sum stops and starts again after a nan appears which, given the last value in the row is 3.0 and not 1.0, appears to mean the first observed nan?

jbrockmendel · 2020-09-02T23:51:00Z

cc @mroeschke OP says this is a bug; do you agree? if so pls add label

mroeschke · 2020-09-06T20:45:50Z

Yeah this looks like a bug. axis='columns' doesn't have much testing and I would expect something similar to

In [10]: df.T.rolling(window=7, min_periods=1).sum().T
Out[10]:
     0    1    2    3    4    5    6
0  0.0  1.0  3.0  7.0  7.0  7.0  7.0
1  0.0  1.0  3.0  3.0  3.0  3.0  3.0
2  0.0  2.0  4.0  4.0  6.0  6.0  7.0

ghost added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Aug 7, 2020

jbrockmendel added Window rolling, ewma, expanding and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 2, 2020

mroeschke added Bug and removed Usage Question labels Sep 6, 2020

mroeschke mentioned this issue Sep 18, 2020

[BUG]: Rolling.sum() calculated wrong values when axis is one and dtypes are mixed #36458

Merged

6 tasks

jreback added this to the 1.2 milestone Sep 19, 2020

jreback closed this as completed in #36458 Sep 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to stop a rolling window at nan values and continue after it? #35596

How to stop a rolling window at nan values and continue after it? #35596

ghost commented Aug 7, 2020 •

edited by ghost

Loading

bmpalatiello commented Aug 17, 2020

jbrockmendel commented Sep 2, 2020

mroeschke commented Sep 6, 2020

How to stop a rolling window at nan values and continue after it? #35596

How to stop a rolling window at nan values and continue after it? #35596

Comments

ghost commented Aug 7, 2020 • edited by ghost Loading

bmpalatiello commented Aug 17, 2020

jbrockmendel commented Sep 2, 2020

mroeschke commented Sep 6, 2020

ghost commented Aug 7, 2020 •

edited by ghost

Loading