Skip to content

CLN: Consistent and Annotated Return Type of _iterate_slices #28958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 16, 2019

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Oct 13, 2019

General pre-cursor to getting block management out of groupby. This is also a pre-cursor to fixing #21668 but needs to be coupled with a few more changes as a follow up

On master calls to _iterate_slices look up by label, potentially yielding a DataFrame if there were duplicated columns. This takes the surprise out of that and simply returns a Tuple of label / series for each item along the axis

@jbrockmendel

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u perf check for groupby

slice_axis = self._selection_list
slicer = lambda x: self.obj[x]
def _iterate_slices(self) -> Iterator[Tuple[Hashable, Series]]:
obj = self._selected_obj.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why copy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mistake on my end. Didn’t want to mutate passed frame in the axis=1 case but I can get rid of this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off topic: can we make the axis=1 case unnecessary by just transposing when we see it? Not sure exactly how that would work, but it would let us get rid of some lambda-wrapping, which would be nice.

@jbrockmendel
Copy link
Member

Might want to look at DataFrameGroupBy._transform_item_by_item. If I'm reading it correctly, that will screw up if we have non-unique columns. (I think that's relevant to what you're doing here)

@WillAyd
Copy link
Member Author

WillAyd commented Oct 13, 2019

can u perf check for groupby

Ran entire GroupBy benchmark and showed no changes

Might want to look at DataFrameGroupBy._transform_item_by_item

Yea looks it could be consolidated to use this, though leaving to a follow up

Also w.r.t. your comment on axis=1 case seems logical but might be some complications around managing selection state with that and how things are constructed. May take a look down the road, though if you see a way to make it work in the meantime makes sense to me

@jreback jreback added this to the 1.0 milestone Oct 16, 2019
@jreback jreback added the Typing type annotations, mypy/pyright type checking label Oct 16, 2019
@jreback jreback merged commit c903e5e into pandas-dev:master Oct 16, 2019
@jreback
Copy link
Contributor

jreback commented Oct 16, 2019

thanks

bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020
@WillAyd WillAyd deleted the consistent-grp-iter branch January 16, 2020 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants