Skip to content

BUG: various groupby ewm times issues #40952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

stevenschaerer
Copy link
Contributor

@stevenschaerer stevenschaerer commented Apr 14, 2021

…g support for times in numba implementation; fix bug in cython implementation
@jreback
Copy link
Contributor

jreback commented Apr 14, 2021

cc @mroeschke

@@ -1511,6 +1511,7 @@ def ewma(const float64_t[:] vals, const int64_t[:] start, const int64_t[:] end,
s = start[j]
e = end[j]
sub_vals = vals[s:e]
sub_deltas = deltas[s:e-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: e - 1

)
# error: Item "str" of "Union[str, ndarray, FrameOrSeries]" has no
# attribute "take"
self.times = self.times.take(groupby_order)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we don't want to change this attribute too much since its still public to the user.

Instead could you define _calculate_deltas(self.times.take(groupby_order), self.halflife)

@@ -926,3 +926,71 @@ def test_pairwise_methods(self, method, expected_data):

expected = df.groupby("A").apply(lambda x: getattr(x.ewm(com=1.0), method)())
tm.assert_frame_equal(result, expected)

def test_times(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you split this into 3 different tests? Some duplication is okay but nice to have each test case isolated.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, thanks for catching these.

Could you add a whatsnew note for v1.3? Can have 3 separate entries all linking to you original issue

@mroeschke mroeschke added Bug Window rolling, ewma, expanding labels Apr 14, 2021
@stevenschaerer
Copy link
Contributor Author

Thanks for your comments, I've incorporated them into the PR. Please have a look when you get a chance.

From what I can tell the failing checks seem to be due to test_empty_multi in test_reductions.py. I saw another recent PR with the same issue (https://github.com/pandas-dev/pandas/actions/runs/751397894). Is there a way to re-run the checks without pushing a commit?

@@ -792,6 +792,9 @@ Groupby/resample/rolling
- Bug in :class:`core.window.RollingGroupby` where ``as_index=False`` argument in ``groupby`` was ignored (:issue:`39433`)
- Bug in :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising ``ValueError`` when using with nullable type columns holding ``NA`` even with ``skipna=True`` (:issue:`40585`)
- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` incorrectly rounding integer values near the ``int64`` implementations bounds (:issue:`40767`)
- Bug in :meth:`core.window.ewm.ExponentialMovingWindowGroupby.mean` where the times argument was ignored in the numba implementation (:issue:`40951`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the numba implementation -> when engine='numba'

@@ -792,6 +792,9 @@ Groupby/resample/rolling
- Bug in :class:`core.window.RollingGroupby` where ``as_index=False`` argument in ``groupby`` was ignored (:issue:`39433`)
- Bug in :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising ``ValueError`` when using with nullable type columns holding ``NA`` even with ``skipna=True`` (:issue:`40585`)
- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` incorrectly rounding integer values near the ``int64`` implementations bounds (:issue:`40767`)
- Bug in :meth:`core.window.ewm.ExponentialMovingWindowGroupby.mean` where the times argument was ignored in the numba implementation (:issue:`40951`)
- Bug in :meth:`core.window.ewm.ExponentialMovingWindowGroupby.mean` where the wrong times were used in the cython implementation in case of multiple groups (:issue:`40951`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove in the cython implementation

@mroeschke
Copy link
Member

mroeschke commented Apr 15, 2021

A few small whatsnew comments otherwise LGTM (the follow up commit should hopefully clear the CI)

@mroeschke mroeschke added this to the 1.3 milestone Apr 16, 2021
@@ -792,6 +792,9 @@ Groupby/resample/rolling
- Bug in :class:`core.window.RollingGroupby` where ``as_index=False`` argument in ``groupby`` was ignored (:issue:`39433`)
- Bug in :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising ``ValueError`` when using with nullable type columns holding ``NA`` even with ``skipna=True`` (:issue:`40585`)
- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` incorrectly rounding integer values near the ``int64`` implementations bounds (:issue:`40767`)
- Bug in :meth:`core.window.ewm.ExponentialMovingWindowGroupby.mean` where the times argument was ignored when ``engine='numba'`` (:issue:`40951`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have another tweek for ewm.mean can you move these notes below that one

@@ -1511,6 +1511,7 @@ def ewma(const float64_t[:] vals, const int64_t[:] start, const int64_t[:] end,
s = start[j]
e = end[j]
sub_vals = vals[s:e]
sub_deltas = deltas[s:e - 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not s:e? (like sub_vals)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deltas are the (scaled) np.diff of the times vector. so len(deltas) = len(times) - 1 = len(vals) - 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk can you add this comment on the shape of sub_vals vs sub_deltas for future readers

times.view(np.int64), dtype=np.float64 # type: ignore[union-attr]
)
_halflife = float(Timedelta(halflife).value)
self._deltas = np.diff(_times) / _halflife
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you return _deltas and assign in the caller (change the return type as well)

times: str | np.ndarray | FrameOrSeries | None,
halflife: float | TimedeltaConvertibleTypes | None,
) -> None:
# error: Item "str" of "Union[str, ndarray, FrameOrSeries, None]" has no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give a doc-string, describe the params and what this is doing

@@ -141,7 +142,7 @@ def groupby_ewma(

if is_observation or not ignore_na:

old_wt *= old_wt_factor
old_wt *= old_wt_factor ** deltas[start + j - 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why -1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to use delta[i-1] with values[i]. cur = window[j] = values[start + j] (and j starts at 1).

I could introduce a new variable delta_window = deltas[start:stop-1] as in the cython implementation if you think that would make it clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no its fine, if you'd just add some documentation to this effect for future readers

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok looks good, 2 requests. ping on green.

@@ -1511,6 +1511,7 @@ def ewma(const float64_t[:] vals, const int64_t[:] start, const int64_t[:] end,
s = start[j]
e = end[j]
sub_vals = vals[s:e]
sub_deltas = deltas[s:e - 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk can you add this comment on the shape of sub_vals vs sub_deltas for future readers

@@ -303,6 +295,38 @@ def __init__(
self.alpha,
)

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just make this a module level function

@@ -141,7 +142,7 @@ def groupby_ewma(

if is_observation or not ignore_na:

old_wt *= old_wt_factor
old_wt *= old_wt_factor ** deltas[start + j - 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no its fine, if you'd just add some documentation to this effect for future readers

@stevenschaerer
Copy link
Contributor Author

@jreback: I have incorporated your latest comments. CI is green.

@mroeschke mroeschke merged commit ece1217 into pandas-dev:master Apr 16, 2021
@mroeschke
Copy link
Member

Thanks @stevenschaerer nice patch!

@stevenschaerer stevenschaerer deleted the gh40951-fix-groupby-ewm-times branch April 16, 2021 18:43
yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request Apr 21, 2021
* times in ewm groupby: sort times in according to grouping; add missing support for times in numba implementation; fix bug in cython implementation

* add GH issue id to tests

* fix typing validation error

* PR comments

* trying to fix int64 to int32 casting TypeError

* PR comments

* PR comments

* PR comments
yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request May 6, 2021
* times in ewm groupby: sort times in according to grouping; add missing support for times in numba implementation; fix bug in cython implementation

* add GH issue id to tests

* fix typing validation error

* PR comments

* trying to fix int64 to int32 casting TypeError

* PR comments

* PR comments

* PR comments
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
* times in ewm groupby: sort times in according to grouping; add missing support for times in numba implementation; fix bug in cython implementation

* add GH issue id to tests

* fix typing validation error

* PR comments

* trying to fix int64 to int32 casting TypeError

* PR comments

* PR comments

* PR comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Issues with groupby ewm and times
3 participants