Skip to content

BUG: groupby.shift returns different columns when fill_value is specified #41858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jul 28, 2021
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.2.5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Fixed regressions
Bug fixes
~~~~~~~~~

-
- Bug in :meth:`GroupBy.shift` that would return different columns if ``fill_value`` was not None (:issue:`41556`)
-

.. ---------------------------------------------------------------------------
Expand Down
8 changes: 6 additions & 2 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2775,6 +2775,7 @@ def _get_cythonized_result(
result_is_index: bool = False,
pre_processing=None,
post_processing=None,
fill_value=None,
**kwargs,
):
"""
Expand Down Expand Up @@ -2825,6 +2826,8 @@ def _get_cythonized_result(
second argument, i.e. the signature should be
(ndarray, Type). If `needs_nullable=True`, a third argument should be
`nullable`, to allow for processing specific to nullable values.
fill_value : any, default None
The scalar value to use for newly introduced missing values.
**kwargs : dict
Extra arguments to be passed back to Cython funcs

Expand Down Expand Up @@ -2923,7 +2926,7 @@ def _get_cythonized_result(
result = result.reshape(-1)

if result_is_index:
result = algorithms.take_nd(values, result)
result = algorithms.take_nd(values, result, fill_value=fill_value)

if post_processing:
result = post_processing(result, inferences)
Expand Down Expand Up @@ -2970,7 +2973,7 @@ def shift(self, periods=1, freq=None, axis=0, fill_value=None):
tshift : Shift the time index, using the index’s frequency
if available.
"""
if freq is not None or axis != 0 or not isna(fill_value):
if freq is not None or axis != 0:
return self.apply(lambda x: x.shift(periods, freq, axis, fill_value))

return self._get_cythonized_result(
Expand All @@ -2980,6 +2983,7 @@ def shift(self, periods=1, freq=None, axis=0, fill_value=None):
needs_ngroups=True,
result_is_index=True,
periods=periods,
fill_value=fill_value,
)

@final
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/groupby/test_groupby_shift_diff.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def test_group_shift_with_fill_value():
columns=["Z"],
index=None,
)
result = g.shift(-1, fill_value=0)[["Z"]]
result = g.shift(-1, fill_value=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have sufficient coverage of the tests from the OP?

Copy link
Member Author

@smithto1 smithto1 Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @jreback. I have investigated it.

The original problem was that the grouping columns were improperly returned. The old form of this test would extract the value column ([["Z"]]), so the test did not detect that the grouping columns were returned. I have modified this test so it no longer extracts the value column; if the grouping columns are returned (the OP) this test will now fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk great

if u can resolve conflicts and merge master

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback conflicts resolved. I think it can be merged.


tm.assert_frame_equal(result, expected)

Expand Down