-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug 29764 groupby loses index name sometimes #33111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 29764 groupby loses index name sometimes #33111
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were any other aggregation functions affected by the loss of index names?
base_func = getattr(libgroupby, how) | ||
obj = self._selected_obj | ||
if isinstance(obj, DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are you doing here? this is very odd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function call self._wrap_aggregated_output
or self._wrap_transformed_output(output)
convert the array containing the results in a DataFrame (or Series, but not important in our case). This is below the marked section. The name of the index is already lost at this point, so I added the name to the output dictionary, which is given as input for this functions. This is not relevant, if the result is a Series, so I check, if we have a DataFrame.
My first idea was to check if the original object was from the type DataFrameGroupBy
, but I could not perform this check without importing DataFrameGroupBy
during runtime. To avoid this, I used the method you see above.
Does this answer your question?
� Conflicts: � doc/source/whatsnew/v1.1.0.rst
� Conflicts: � doc/source/whatsnew/v1.1.0.rst
can you merge master and will have a look |
� Conflicts: � doc/source/whatsnew/v1.1.0.rst
@@ -2431,8 +2431,11 @@ def _get_cythonized_result( | |||
grouper = self.grouper | |||
|
|||
labels, _, ngroups = grouper.group_info | |||
output: Dict[base.OutputKey, np.ndarray] = {} | |||
output: Dict[Union[base.OutputKey, str], Union[np.ndarray, str]] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this at all, output is very well defined here and we are now using it for multiple different things.
can we just pass the index name to _wrap_aggregated_output and _wrap_transformed_output ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to implement this today or tomorrow.
We have to change the Series code too in this case, if I remember correctly. But I can tell you more after implementing this. Thanks for the feedback,
if u can merge master and update will look |
@phofl can you move release note to 1.2 and merge upstream/master to resolve conflict |
� Conflicts: � doc/source/whatsnew/v1.1.0.rst � pandas/core/groupby/generic.py � pandas/tests/groupby/test_groupby.py
Merged master and moved whats new entry. I hope, that I will be able to change the functions as requested by jereback in the next few days. I have not that much time currently |
Issue was fixed for sum and any with c9144ca. I close this PR and open an new one for the rest |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Some group by functions lost the column index name. For the functions running into
_get_cythonized_result
in groupby.py, the index name was just ignored when defining the column names. So the following functions_wrap_aggregated_output
and_wrap_transformed_output
in the class DataFrameGroupBy had not acces to this information, because it was already lost there. I collected the information beforehand and defined the Index Name accordingly.We could refactor both methods a bit at a later stage, because at it is right now (and was before) the first few lines are duplicates.