-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DEPR: dropping nuisance columns in rolling aggregations #42834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
expected = df.ewm(halflife=1.0, min_periods=min_periods).mean() | ||
with tm.assert_produces_warning(FutureWarning, match="nuisance columns"): | ||
# GH#42738 | ||
result = df.ewm(halflife=halflife, min_periods=min_periods, times=times).mean() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should probably add a deprecation here too that allows times to be a string of a column label (which is already kinda flawed because a column label isn't always a string). A column of times will always be a nuisance column I think
pandas/pandas/core/window/ewm.py
Line 317 in a5f8c9a
if isinstance(self.times, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be done separately? this is less obvious to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes.
pandas/tests/window/test_numba.py
Outdated
result = ewm.mean(engine="numba", engine_kwargs=engine_kwargs) | ||
expected = ewm.mean(engine="cython") | ||
|
||
# TODO: why only in these cases? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because column "A" is string dtype and in the non-group case there's an attempt to agg over the column, while in the groupby case, the grouping column should be dropped internally before hand
doc/source/whatsnew/v1.4.0.rst
Outdated
@@ -159,6 +159,7 @@ Deprecations | |||
- Deprecated treating ``numpy.datetime64`` objects as UTC times when passed to the :class:`Timestamp` constructor along with a timezone. In a future version, these will be treated as wall-times. To retain the old behavior, use ``Timestamp(dt64).tz_localize("UTC").tz_convert(tz)`` (:issue:`24559`) | |||
- Deprecated ignoring missing labels when indexing with a sequence of labels on a level of a MultiIndex (:issue:`42351`) | |||
- Creating an empty Series without a dtype will now raise a more visible ``FutureWarning`` instead of a ``DeprecationWarning`` (:issue:`30017`) | |||
- Deprecated dropping of nuisance columns in :class:`Rolling` aggregations (:issue:`42738`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to include Expanding and EWM as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on board. @mroeschke suggestions + rebase
comments addressed + rebased + green |
if 0 != len(new_mgr.items) != len(mgr.items): | ||
# GH#42738 ignore_failures dropped nuisance columns | ||
warnings.warn( | ||
"Dropping of nuisance columns in rolling operations " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you list the columns that are problematic in the error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated + green
we did this in 1.3 for DataFrame reductions and GroupBy reductions. AFAIK this is the last place we use this pattern.
cc @mroeschke