-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: various groupby ewm times issues #40952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: various groupby ewm times issues #40952
Conversation
stevenschaerer
commented
Apr 14, 2021
•
edited
Loading
edited
- closes BUG: Issues with groupby ewm and times #40951
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
…g support for times in numba implementation; fix bug in cython implementation
cc @mroeschke |
pandas/_libs/window/aggregations.pyx
Outdated
@@ -1511,6 +1511,7 @@ def ewma(const float64_t[:] vals, const int64_t[:] start, const int64_t[:] end, | |||
s = start[j] | |||
e = end[j] | |||
sub_vals = vals[s:e] | |||
sub_deltas = deltas[s:e-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: e - 1
pandas/core/window/ewm.py
Outdated
) | ||
# error: Item "str" of "Union[str, ndarray, FrameOrSeries]" has no | ||
# attribute "take" | ||
self.times = self.times.take(groupby_order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we don't want to change this attribute too much since its still public to the user.
Instead could you define _calculate_deltas(self.times.take(groupby_order), self.halflife)
pandas/tests/window/test_groupby.py
Outdated
@@ -926,3 +926,71 @@ def test_pairwise_methods(self, method, expected_data): | |||
|
|||
expected = df.groupby("A").apply(lambda x: getattr(x.ewm(com=1.0), method)()) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_times(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you split this into 3 different tests? Some duplication is okay but nice to have each test case isolated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, thanks for catching these.
Could you add a whatsnew note for v1.3? Can have 3 separate entries all linking to you original issue
Thanks for your comments, I've incorporated them into the PR. Please have a look when you get a chance. From what I can tell the failing checks seem to be due to test_empty_multi in test_reductions.py. I saw another recent PR with the same issue (https://github.com/pandas-dev/pandas/actions/runs/751397894). Is there a way to re-run the checks without pushing a commit? |
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -792,6 +792,9 @@ Groupby/resample/rolling | |||
- Bug in :class:`core.window.RollingGroupby` where ``as_index=False`` argument in ``groupby`` was ignored (:issue:`39433`) | |||
- Bug in :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising ``ValueError`` when using with nullable type columns holding ``NA`` even with ``skipna=True`` (:issue:`40585`) | |||
- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` incorrectly rounding integer values near the ``int64`` implementations bounds (:issue:`40767`) | |||
- Bug in :meth:`core.window.ewm.ExponentialMovingWindowGroupby.mean` where the times argument was ignored in the numba implementation (:issue:`40951`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the numba implementation -> when engine='numba'
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -792,6 +792,9 @@ Groupby/resample/rolling | |||
- Bug in :class:`core.window.RollingGroupby` where ``as_index=False`` argument in ``groupby`` was ignored (:issue:`39433`) | |||
- Bug in :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising ``ValueError`` when using with nullable type columns holding ``NA`` even with ``skipna=True`` (:issue:`40585`) | |||
- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` incorrectly rounding integer values near the ``int64`` implementations bounds (:issue:`40767`) | |||
- Bug in :meth:`core.window.ewm.ExponentialMovingWindowGroupby.mean` where the times argument was ignored in the numba implementation (:issue:`40951`) | |||
- Bug in :meth:`core.window.ewm.ExponentialMovingWindowGroupby.mean` where the wrong times were used in the cython implementation in case of multiple groups (:issue:`40951`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove in the cython implementation
A few small whatsnew comments otherwise LGTM (the follow up commit should hopefully clear the CI) |
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -792,6 +792,9 @@ Groupby/resample/rolling | |||
- Bug in :class:`core.window.RollingGroupby` where ``as_index=False`` argument in ``groupby`` was ignored (:issue:`39433`) | |||
- Bug in :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising ``ValueError`` when using with nullable type columns holding ``NA`` even with ``skipna=True`` (:issue:`40585`) | |||
- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` incorrectly rounding integer values near the ``int64`` implementations bounds (:issue:`40767`) | |||
- Bug in :meth:`core.window.ewm.ExponentialMovingWindowGroupby.mean` where the times argument was ignored when ``engine='numba'`` (:issue:`40951`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have another tweek for ewm.mean can you move these notes below that one
@@ -1511,6 +1511,7 @@ def ewma(const float64_t[:] vals, const int64_t[:] start, const int64_t[:] end, | |||
s = start[j] | |||
e = end[j] | |||
sub_vals = vals[s:e] | |||
sub_deltas = deltas[s:e - 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not s:e
? (like sub_vals)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deltas are the (scaled) np.diff of the times vector. so len(deltas) = len(times) - 1 = len(vals) - 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk can you add this comment on the shape of sub_vals vs sub_deltas for future readers
pandas/core/window/ewm.py
Outdated
times.view(np.int64), dtype=np.float64 # type: ignore[union-attr] | ||
) | ||
_halflife = float(Timedelta(halflife).value) | ||
self._deltas = np.diff(_times) / _halflife |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you return _deltas and assign in the caller (change the return type as well)
pandas/core/window/ewm.py
Outdated
times: str | np.ndarray | FrameOrSeries | None, | ||
halflife: float | TimedeltaConvertibleTypes | None, | ||
) -> None: | ||
# error: Item "str" of "Union[str, ndarray, FrameOrSeries, None]" has no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give a doc-string, describe the params and what this is doing
@@ -141,7 +142,7 @@ def groupby_ewma( | |||
|
|||
if is_observation or not ignore_na: | |||
|
|||
old_wt *= old_wt_factor | |||
old_wt *= old_wt_factor ** deltas[start + j - 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to use delta[i-1]
with values[i]
. cur = window[j] = values[start + j]
(and j
starts at 1).
I could introduce a new variable delta_window = deltas[start:stop-1]
as in the cython implementation if you think that would make it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no its fine, if you'd just add some documentation to this effect for future readers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok looks good, 2 requests. ping on green.
@@ -1511,6 +1511,7 @@ def ewma(const float64_t[:] vals, const int64_t[:] start, const int64_t[:] end, | |||
s = start[j] | |||
e = end[j] | |||
sub_vals = vals[s:e] | |||
sub_deltas = deltas[s:e - 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk can you add this comment on the shape of sub_vals vs sub_deltas for future readers
pandas/core/window/ewm.py
Outdated
@@ -303,6 +295,38 @@ def __init__( | |||
self.alpha, | |||
) | |||
|
|||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just make this a module level function
@@ -141,7 +142,7 @@ def groupby_ewma( | |||
|
|||
if is_observation or not ignore_na: | |||
|
|||
old_wt *= old_wt_factor | |||
old_wt *= old_wt_factor ** deltas[start + j - 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no its fine, if you'd just add some documentation to this effect for future readers
@jreback: I have incorporated your latest comments. CI is green. |
Thanks @stevenschaerer nice patch! |
* times in ewm groupby: sort times in according to grouping; add missing support for times in numba implementation; fix bug in cython implementation * add GH issue id to tests * fix typing validation error * PR comments * trying to fix int64 to int32 casting TypeError * PR comments * PR comments * PR comments
* times in ewm groupby: sort times in according to grouping; add missing support for times in numba implementation; fix bug in cython implementation * add GH issue id to tests * fix typing validation error * PR comments * trying to fix int64 to int32 casting TypeError * PR comments * PR comments * PR comments
* times in ewm groupby: sort times in according to grouping; add missing support for times in numba implementation; fix bug in cython implementation * add GH issue id to tests * fix typing validation error * PR comments * trying to fix int64 to int32 casting TypeError * PR comments * PR comments * PR comments