PERF: Groupby.idxmax #52339

jbrockmendel · 2023-04-01T00:54:29Z

cc @phofl this is currently failing locally and could use a fresh pair of eyes.

phofl · 2023-04-07T19:43:42Z

I'll try to take a look over the weekend

jbrockmendel · 2023-04-21T20:10:51Z

@phofl im mothballing this to clear the queue, just a ping so this doesn't fall off your radar entirely

phofl · 2023-04-22T16:26:54Z

I think the main problem is that _wrap_aggregated_output isn't working for unordered categories that are empty. All other reductions that are tested are raising TypeErrors, while idxmax and idxmin return the wrong result. Any ideas what we could do there?

rhshadrach · 2023-07-23T16:41:34Z

pandas/core/groupby/generic.py

+            with com.temp_setattr(self, "observed", True):
+                argmin = self._cython_agg_general("argmin", alt=alt, skipna=skipna)


There is a bit of an oddity with how missing observations are handled. When there are multiple groupings, we do not include the unobserved categories in e.g. grouper.result_index and fill in any unobserved ones in _wrap_aggregated_output. However, if there is just a single grouping we do include the unobserved categories in e.g. grouper.result_index and so we don't fill them in later on. This makes this approach not work for some cases of categoricals.

I plan to look into making it so we never include the unobserved categories until _wrap_aggreagted_output in the single grouping case. If we can make that work, this approach would work.

PERF: Groupby.idxmax

404d5b4

mroeschke requested a review from phofl April 7, 2023 17:51

jbrockmendel closed this Apr 21, 2023

jbrockmendel added the Mothballed Temporarily-closed PR the author plans to return to label Apr 21, 2023

jbrockmendel mentioned this pull request Jul 23, 2023

PERF: Implement groupby idxmax/idxmin in Cython #54234

Merged

5 tasks

jbrockmendel removed the Mothballed Temporarily-closed PR the author plans to return to label Jul 23, 2023

rhshadrach reviewed Jul 23, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Groupby.idxmax #52339

PERF: Groupby.idxmax #52339

jbrockmendel commented Apr 1, 2023

phofl commented Apr 7, 2023

jbrockmendel commented Apr 21, 2023

phofl commented Apr 22, 2023

rhshadrach Jul 23, 2023 •

edited

Loading

		with com.temp_setattr(self, "observed", True):
		argmin = self._cython_agg_general("argmin", alt=alt, skipna=skipna)

PERF: Groupby.idxmax #52339

PERF: Groupby.idxmax #52339

Conversation

jbrockmendel commented Apr 1, 2023

phofl commented Apr 7, 2023

jbrockmendel commented Apr 21, 2023

phofl commented Apr 22, 2023

rhshadrach Jul 23, 2023 • edited Loading

Choose a reason for hiding this comment

rhshadrach Jul 23, 2023 •

edited

Loading