Skip to content

How to stop a rolling window at nan values and continue after it? #35596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Aug 7, 2020 · 3 comments · Fixed by #36458
Closed

How to stop a rolling window at nan values and continue after it? #35596

ghost opened this issue Aug 7, 2020 · 3 comments · Fixed by #36458
Labels
Bug Window rolling, ewma, expanding
Milestone

Comments

@ghost
Copy link

ghost commented Aug 7, 2020

Posted the same question on stackoverflow. A user there said I should open a issue here on the github page, since it is a bug.

I have the following dataframe:

     df = pd.DataFrame([[0, 1, 2, 4, np.nan, np.nan, np.nan],
                   [0, 1, 2 ,np.nan, np.nan, np.nan,np.nan],
                   [0, 2, 2 ,np.nan, 2, np.nan,1]])

With output:

       0  1  2    3    4   5   6
    0  0  1  2  4.0  NaN NaN NaN
    1  0  1  2  NaN  NaN NaN NaN
    2  0  2  2  NaN  2.0 NaN 1.0

with dtypes:
df.dtypes

    0      int64
    1      int64
    2      int64
    3    float64
    4    float64
    5    float64
    6    float64
    dtype: object

Then the underneath rolling summation is applied:

df.rolling(window = 7, min_periods =1, axis = 'columns').sum()

And the output is as follows:

         0    1    2    3    4    5    6
    0  0.0  1.0  3.0  4.0  4.0  4.0  4.0
    1  0.0  1.0  3.0  NaN  NaN  NaN  NaN
    2  0.0  2.0  4.0  NaN  2.0  2.0  3.0

I notice that the rolling window stops and starts again whenever the dtype of the next column is different.

I however have a dataframe whereby all columns are of the same object type.
df = df.astype('object')``

which has output:

         0    1    2    3    4    5    6
    0  0.0  1.0  3.0  7.0  7.0  7.0  7.0
    1  0.0  1.0  3.0  3.0  3.0  3.0  3.0
    2  0.0  2.0  4.0  4.0  6.0  6.0  7.0

My desired output however, stops and starts again after a nan value appears. This would look like:


         0    1    2    3    4    5    6
    0  0.0  1.0  3.0  7.0  NaN  NaN  NaN
    1  0.0  1.0  3.0  NaN  NaN  NaN  NaN
    2  0.0  2.0  4.0  NaN  2.0  NaN  3.0

I figured there must be a way that NaN values are not considered but also not filled in with values obtained from the rolling window.

Anything would help!

@ghost ghost added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Aug 7, 2020
@bmpalatiello
Copy link

I'm trying to understand your desired output. You say the rolling sum stops and starts again after a nan appears which, given the last value in the row is 3.0 and not 1.0, appears to mean the first observed nan?

@jbrockmendel jbrockmendel added Window rolling, ewma, expanding and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 2, 2020
@jbrockmendel
Copy link
Member

cc @mroeschke OP says this is a bug; do you agree? if so pls add label

@mroeschke
Copy link
Member

Yeah this looks like a bug. axis='columns' doesn't have much testing and I would expect something similar to

In [10]: df.T.rolling(window=7, min_periods=1).sum().T
Out[10]:
     0    1    2    3    4    5    6
0  0.0  1.0  3.0  7.0  7.0  7.0  7.0
1  0.0  1.0  3.0  3.0  3.0  3.0  3.0
2  0.0  2.0  4.0  4.0  6.0  6.0  7.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants