Skip to content

CLN: remove unreachable code in pandas/core/groupby/generic.py::DataFrameGroupBy::_transform_general #32583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Mar 10, 2020

Is this part of the code

        else:
            # Handle cases like BinGrouper
            return self._concat_objects(keys, values, not_indexed_same=not_indexed_same)

necessary? The condition before it is

elif self.grouper.groupings is not None

. However, even in the case of BinGrouper, I don't believe BinGrouper.groupings can be None:

>>> from pandas.core.groupby.ops import BinGrouper
>>> BinGrouper([], []).groupings                                                                                                                                                                   
[Grouping(None)]

and so this code is unreachable. It's not covered by the tests, at least - https://codecov.io/gh/pandas-dev/pandas/src/master/pandas/core/groupby/generic.py#L1354

If it can be removed, I've taken it out and then shifted the code above it back by one tab (because we no longer need the guard)

@jbrockmendel
Copy link
Member

IIRC the coverage results dont include the slow tests. did you run those?

@MarcoGorelli
Copy link
Member Author

IIRC the coverage results dont include the slow tests. did you run those?

Thanks, wasn't aware of that.

Don't they run as part of CI? If so, looks like they still pass

@@ -1200,158 +1200,147 @@ def first_not_none(values):
# GH9684. If all values are None, then this will throw an error.
# We'd prefer it return an empty dataframe.
return DataFrame()
elif isinstance(v, DataFrame):
return self._concat_objects(keys, values, not_indexed_same=not_indexed_same)
elif self.grouper.groupings is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little hard to tell from the diff - is removing this branch the only change?

Copy link
Member Author

@MarcoGorelli MarcoGorelli Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also changed the elifs after the return statements to if. This was only on line 1206 (new version). However, I've reverted that change.

Then, I removed

elif self.grouper.groupings is not None:

so I could shift the block after it back by 4 spaces.

Shall I only remove the last branch if that makes it easier to review?

if len(keys) == ping.ngroups:
key_index = ping.group_index
key_index.name = key_names[0]
if isinstance(v, DataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can still be an elif of the above


key_lookup = Index(keys)
indexer = key_lookup.get_indexer(key_index)
if len(self.grouper.groupings) > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this possible to be None?

Copy link
Member Author

@MarcoGorelli MarcoGorelli Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For BaseGrouper, .groupings needs to be a list, so in that case, no, as inside its __init__ method there is

        self._groupings: List[grouper.Grouping] = list(groupings)

and .groupings simply returns ._groupings.

BinGrouper inherits from BaseGrouper, but changes its .groupings definition to

    @property
    def groupings(self) -> "List[grouper.Grouping]":
        return [
            grouper.Grouping(lvl, lvl, in_axis=False, level=None, name=name)
            for lvl, name in zip(self.levels, self.names)
        ]

So, it still cannot be None.

I'm still making sense of all this group-by code though, so I'll still look to see if there's anything I've missed

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than changing this all at once. can you make a function out of some of the internal code in a prec-cursor PR, then this will be easier to interpret. Its very hard to tell what is changed.

@WillAyd
Copy link
Member

WillAyd commented Apr 2, 2020

@MarcoGorelli still active? Can you address latest comments?

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Apr 2, 2020

Sorry for the delay - yes, I'll get to this during the weekend

EDIT

am pretty time constrained right now, will get to this next weekend

@MarcoGorelli
Copy link
Member Author

Closing for now, as I'm busy with other things and have other PRs to respond to: I've opened this up as a good first issue: #33478 .

If nobody takes the issue, I'll come back to it

@MarcoGorelli MarcoGorelli deleted the clean-up-_transform_general branch October 10, 2020 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants