Bug 29764 groupby loses index name sometimes #33111

phofl · 2020-03-29T01:19:03Z

closes GroupBy Doesn't Always Maintain Column Index Name #29764
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry´

Some group by functions lost the column index name. For the functions running into _get_cythonized_result in groupby.py, the index name was just ignored when defining the column names. So the following functions _wrap_aggregated_output and _wrap_transformed_output in the class DataFrameGroupBy had not acces to this information, because it was already lost there. I collected the information beforehand and defined the Index Name accordingly.

We could refactor both methods a bit at a later stage, because at it is right now (and was before) the first few lines are duplicates.

doc/source/whatsnew/v1.1.0.rst

pandas/core/groupby/generic.py

pandas/core/groupby/groupby.py

mroeschke

Were any other aggregation functions affected by the loss of index names?

doc/source/whatsnew/v1.1.0.rst

pandas/core/groupby/groupby.py

pandas/tests/groupby/test_groupby.py

jreback · 2020-04-07T00:24:59Z

pandas/core/groupby/groupby.py

        base_func = getattr(libgroupby, how)
+        obj = self._selected_obj
+        if isinstance(obj, DataFrame):


what are you doing here? this is very odd

The function call self._wrap_aggregated_output or self._wrap_transformed_output(output) convert the array containing the results in a DataFrame (or Series, but not important in our case). This is below the marked section. The name of the index is already lost at this point, so I added the name to the output dictionary, which is given as input for this functions. This is not relevant, if the result is a Series, so I check, if we have a DataFrame.

My first idea was to check if the original object was from the type DataFrameGroupBy, but I could not perform this check without importing DataFrameGroupBy during runtime. To avoid this, I used the method you see above.

Does this answer your question?

pandas/tests/groupby/test_groupby.py

� Conflicts: � doc/source/whatsnew/v1.1.0.rst

jreback · 2020-06-14T22:36:33Z

can you merge master and will have a look

� Conflicts: � doc/source/whatsnew/v1.1.0.rst

jreback · 2020-06-15T22:56:32Z

pandas/core/groupby/groupby.py

@@ -2431,8 +2431,11 @@ def _get_cythonized_result(
        grouper = self.grouper

        labels, _, ngroups = grouper.group_info
-        output: Dict[base.OutputKey, np.ndarray] = {}
+        output: Dict[Union[base.OutputKey, str], Union[np.ndarray, str]] = {}


I don't like this at all, output is very well defined here and we are now using it for multiple different things.

can we just pass the index name to _wrap_aggregated_output and _wrap_transformed_output ?

I'll try to implement this today or tomorrow.

We have to change the Series code too in this case, if I remember correctly. But I can tell you more after implementing this. Thanks for the feedback,

jreback · 2020-07-17T11:19:46Z

if u can merge master and update will look

simonjayhawkins · 2020-08-01T14:00:02Z

@phofl can you move release note to 1.2 and merge upstream/master to resolve conflict

� Conflicts: � doc/source/whatsnew/v1.1.0.rst � pandas/core/groupby/generic.py � pandas/tests/groupby/test_groupby.py

phofl · 2020-08-04T17:43:11Z

Merged master and moved whats new entry. I hope, that I will be able to change the functions as requested by jereback in the next few days. I have not that much time currently

phofl · 2020-09-04T16:35:05Z

Issue was fixed for sum and any with c9144ca. I close this PR and open an new one for the rest

phofl added 2 commits March 29, 2020 03:10

BUG: GroupBy Doesn't Always Maintain Column Index Name pandas-dev#29764

94b3bdb

BUG: Add whats new entry pandas-dev#29764

4b20125