-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: Consistent and Annotated Return Type of _iterate_slices #28958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u perf check for groupby
pandas/core/groupby/generic.py
Outdated
slice_axis = self._selection_list | ||
slicer = lambda x: self.obj[x] | ||
def _iterate_slices(self) -> Iterator[Tuple[Hashable, Series]]: | ||
obj = self._selected_obj.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mistake on my end. Didn’t want to mutate passed frame in the axis=1 case but I can get rid of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off topic: can we make the axis=1 case unnecessary by just transposing when we see it? Not sure exactly how that would work, but it would let us get rid of some lambda-wrapping, which would be nice.
Might want to look at DataFrameGroupBy._transform_item_by_item. If I'm reading it correctly, that will screw up if we have non-unique columns. (I think that's relevant to what you're doing here) |
Ran entire GroupBy benchmark and showed no changes
Yea looks it could be consolidated to use this, though leaving to a follow up Also w.r.t. your comment on axis=1 case seems logical but might be some complications around managing selection state with that and how things are constructed. May take a look down the road, though if you see a way to make it work in the meantime makes sense to me |
thanks |
General pre-cursor to getting block management out of groupby. This is also a pre-cursor to fixing #21668 but needs to be coupled with a few more changes as a follow up
On master calls to _iterate_slices look up by label, potentially yielding a DataFrame if there were duplicated columns. This takes the surprise out of that and simply returns a Tuple of label / series for each item along the axis
@jbrockmendel