Skip to content

[BUG]: Rolling.sum() calculated wrong values when axis is one and dtypes are mixed #36458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 19, 2020

Conversation

phofl
Copy link
Member

@phofl phofl commented Sep 18, 2020

In case of axis=1 and mixed dtypes, the _apply_blockwise did not calculate the sum for a complete row. Through converting the obj and consolidating the blocks we can avoid this.

@mroeschke
Copy link
Member

Does this also potentially close #35596?

@phofl
Copy link
Member Author

phofl commented Sep 18, 2020

Thanks, I missed that issue. When the output you posted is the correct one, then yes, this fixes the issue. I will add a test.

@@ -243,7 +243,9 @@ def _create_data(self, obj: FrameOrSeries) -> FrameOrSeries:
if self.on is not None and not isinstance(self.on, Index):
if obj.ndim == 2:
obj = obj.reindex(columns=obj.columns.difference([self.on]), copy=False)

if self.axis == 1:
obj = obj.astype("float64", copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment with the issue number and a small explanation on why we are doing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jbrockmendel FYI

@mroeschke mroeschke added Bug Window rolling, ewma, expanding labels Sep 18, 2020
@mroeschke mroeschke added this to the 1.2 milestone Sep 18, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add another test which has mixed types with say a string in one column (and also try with datetimes we don't want to coerce), so this raises I think we are going to have an odd error message and may want to catch / re-raise if that is the case

actually i think we need to explicity get outnumeric dtypes (e.g. do a select_dtypes(include=[['integer', 'float']])

otherwise we might include timedelta for example (and we don't know how to coerce back I think)

@@ -243,7 +243,9 @@ def _create_data(self, obj: FrameOrSeries) -> FrameOrSeries:
if self.on is not None and not isinstance(self.on, Index):
if obj.ndim == 2:
obj = obj.reindex(columns=obj.columns.difference([self.on]), copy=False)

if self.axis == 1:
obj = obj.astype("float64", copy=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jbrockmendel FYI

@phofl
Copy link
Member Author

phofl commented Sep 19, 2020

@jreback so we should raise if we get dtypes except float and int?

@mroeschke
Copy link
Member

I think we should just drop columns that are not coercible to float like axis=0 behavior

In [1]: df = pd.DataFrame({'a': range(5), 'b': ['a'] * 5})

In [2]: df
Out[2]:
   a  b
0  0  a
1  1  a
2  2  a
3  3  a
4  4  a

In [3]: df.rolling(2).sum()
Out[3]:
     a
0  NaN
1  1.0
2  3.0
3  5.0
4  7.0

@phofl
Copy link
Member Author

phofl commented Sep 19, 2020

Thanks @mroeschke. Implemented it.

@jreback
Copy link
Contributor

jreback commented Sep 19, 2020

lgtm ping on green.

@phofl
Copy link
Member Author

phofl commented Sep 19, 2020

@jreback green

@jreback jreback merged commit 00a510b into pandas-dev:master Sep 19, 2020
@jreback
Copy link
Contributor

jreback commented Sep 19, 2020

thanks @phofl very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Window rolling, ewma, expanding
Projects
None yet
3 participants