-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[BUG]: Rolling.sum() calculated wrong values when axis is one and dtypes are mixed #36458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does this also potentially close #35596? |
Thanks, I missed that issue. When the output you posted is the correct one, then yes, this fixes the issue. I will add a test. |
@@ -243,7 +243,9 @@ def _create_data(self, obj: FrameOrSeries) -> FrameOrSeries: | |||
if self.on is not None and not isinstance(self.on, Index): | |||
if obj.ndim == 2: | |||
obj = obj.reindex(columns=obj.columns.difference([self.on]), copy=False) | |||
|
|||
if self.axis == 1: | |||
obj = obj.astype("float64", copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment with the issue number and a small explanation on why we are doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jbrockmendel FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add another test which has mixed types with say a string in one column (and also try with datetimes we don't want to coerce), so this raises I think we are going to have an odd error message and may want to catch / re-raise if that is the case
actually i think we need to explicity get outnumeric dtypes (e.g. do a select_dtypes(include=[['integer', 'float']])
otherwise we might include timedelta for example (and we don't know how to coerce back I think)
@@ -243,7 +243,9 @@ def _create_data(self, obj: FrameOrSeries) -> FrameOrSeries: | |||
if self.on is not None and not isinstance(self.on, Index): | |||
if obj.ndim == 2: | |||
obj = obj.reindex(columns=obj.columns.difference([self.on]), copy=False) | |||
|
|||
if self.axis == 1: | |||
obj = obj.astype("float64", copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jbrockmendel FYI
@jreback so we should raise if we get dtypes except float and int? |
I think we should just drop columns that are not coercible to float like
|
Thanks @mroeschke. Implemented it. |
lgtm ping on green. |
@jreback green |
thanks @phofl very nice! |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
In case of
axis=1
and mixed dtypes, the_apply_blockwise
did not calculate the sum for a complete row. Through converting the obj and consolidating the blocks we can avoid this.