Skip to content

PERF: Groupby.idxmax #52339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

jbrockmendel
Copy link
Member

cc @phofl this is currently failing locally and could use a fresh pair of eyes.

@mroeschke mroeschke requested a review from phofl April 7, 2023 17:51
@phofl
Copy link
Member

phofl commented Apr 7, 2023

I'll try to take a look over the weekend

@jbrockmendel
Copy link
Member Author

@phofl im mothballing this to clear the queue, just a ping so this doesn't fall off your radar entirely

@jbrockmendel jbrockmendel added the Mothballed Temporarily-closed PR the author plans to return to label Apr 21, 2023
@phofl
Copy link
Member

phofl commented Apr 22, 2023

I think the main problem is that _wrap_aggregated_output isn't working for unordered categories that are empty. All other reductions that are tested are raising TypeErrors, while idxmax and idxmin return the wrong result. Any ideas what we could do there?

@jbrockmendel jbrockmendel removed the Mothballed Temporarily-closed PR the author plans to return to label Jul 23, 2023
Comment on lines +1143 to +1144
with com.temp_setattr(self, "observed", True):
argmin = self._cython_agg_general("argmin", alt=alt, skipna=skipna)
Copy link
Member

@rhshadrach rhshadrach Jul 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bit of an oddity with how missing observations are handled. When there are multiple groupings, we do not include the unobserved categories in e.g. grouper.result_index and fill in any unobserved ones in _wrap_aggregated_output. However, if there is just a single grouping we do include the unobserved categories in e.g. grouper.result_index and so we don't fill them in later on. This makes this approach not work for some cases of categoricals.

I plan to look into making it so we never include the unobserved categories until _wrap_aggreagted_output in the single grouping case. If we can make that work, this approach would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants