Skip to content

BUG: REF: Groupby refactor #46754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,7 @@ Groupby/resample/rolling
- Bug in :meth:`GroupBy.max` with empty groups and ``uint64`` dtype incorrectly raising ``RuntimeError`` (:issue:`46408`)
- Bug in :meth:`.GroupBy.apply` would fail when ``func`` was a string and args or kwargs were supplied (:issue:`46479`)
- Bug in :meth:`.Rolling.var` would segfault calculating weighted variance when window size was larger than data size (:issue:`46760`)
- Bug in :meth:`Grouper.__repr__` where ``dropna`` was not included. Now it is (:issue:`46754`)

Reshaping
^^^^^^^^^
Expand Down
7 changes: 3 additions & 4 deletions pandas/core/groupby/grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ class Grouper:
_gpr_index: Index | None
_grouper: Index | None

_attributes: tuple[str, ...] = ("key", "level", "freq", "axis", "sort")
_attributes: tuple[str, ...] = ("key", "level", "freq", "axis", "sort", "dropna")

def __new__(cls, *args, **kwargs):
if kwargs.get("freq") is not None:
Expand All @@ -287,6 +287,7 @@ def __init__(
self.freq = freq
self.axis = axis
self.sort = sort
self.dropna = dropna

self.grouper = None
self._gpr_index = None
Expand All @@ -295,7 +296,6 @@ def __init__(
self.binner = None
self._grouper = None
self._indexer = None
self.dropna = dropna

@final
@property
Expand Down Expand Up @@ -339,7 +339,7 @@ def _get_grouper(
return self.binner, self.grouper, self.obj # type: ignore[return-value]

@final
def _set_grouper(self, obj: NDFrame, sort: bool = False):
def _set_grouper(self, obj: NDFrame, sort: bool = False) -> None:
"""
given an object and the specifications, setup the internal grouper
for this particular specification
Expand Down Expand Up @@ -413,7 +413,6 @@ def _set_grouper(self, obj: NDFrame, sort: bool = False):
# "NDFrameT", variable has type "None")
self.obj = obj # type: ignore[assignment]
self._gpr_index = ax
return self._gpr_index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems okay to me; the result of this method is indeed never used and it looks cleaner to not have a return. cc @jbrockmendel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @rhshadrach! FYI we talked about this a bit at #46258 (comment) if you want a bit more context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as resolved, since I'm worried the "changes requested" label might be keeping this out of your queue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it being resolved or not changes its appearance in e.g. pandas/pulls or the GitHub notifications (not sure what else you might mean by queue here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what I meant (maybe you had filters for what to review next), but good to know. Thanks!


@final
@property
Expand Down
9 changes: 3 additions & 6 deletions pandas/core/resample.py
Original file line number Diff line number Diff line change
Expand Up @@ -1470,14 +1470,14 @@ def __init__(
self,
freq="Min",
closed: Literal["left", "right"] | None = None,
label: str | None = None,
label: Literal["left", "right"] | None = None,
how="mean",
axis=0,
fill_method=None,
limit=None,
loffset=None,
kind: str | None = None,
convention: str | None = None,
convention: Literal["start", "end", "e", "s"] | None = None,
base: int | None = None,
origin: str | TimestampConvertibleTypes = "start_day",
offset: TimedeltaConvertibleTypes | None = None,
Expand Down Expand Up @@ -1523,10 +1523,7 @@ def __init__(
self.closed = closed
self.label = label
self.kind = kind

self.convention = convention or "E"
self.convention = self.convention.lower()

self.convention = convention if convention is not None else "e"
self.how = how
self.fill_method = fill_method
self.limit = limit
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
def test_repr():
# GH18203
result = repr(Grouper(key="A", level="B"))
expected = "Grouper(key='A', level='B', axis=0, sort=False)"
expected = "Grouper(key='A', level='B', axis=0, sort=False, dropna=True)"
assert result == expected


Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/resample/test_time_grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,15 +266,15 @@ def test_repr():
# GH18203
result = repr(Grouper(key="A", freq="H"))
expected = (
"TimeGrouper(key='A', freq=<Hour>, axis=0, sort=True, "
"TimeGrouper(key='A', freq=<Hour>, axis=0, sort=True, dropna=True, "
"closed='left', label='left', how='mean', "
"convention='e', origin='start_day')"
)
assert result == expected

result = repr(Grouper(key="A", freq="H", origin="2000-01-01"))
expected = (
"TimeGrouper(key='A', freq=<Hour>, axis=0, sort=True, "
"TimeGrouper(key='A', freq=<Hour>, axis=0, sort=True, dropna=True, "
"closed='left', label='left', how='mean', "
"convention='e', origin=Timestamp('2000-01-01 00:00:00'))"
)
Expand Down