-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix error for boxplot
when using a pre-grouped DataFrame
with more than one grouping
#57985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix error for boxplot
when using a pre-grouped DataFrame
with more than one grouping
#57985
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looking good.
ret.loc[key] = d | ||
# GH 14701 'key' needs to be converted to text as 'key' is a tuple when | ||
# there is more than one group | ||
ret.loc[pprint_thing(key)] = d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of ret = pd.Series(dtype=object)
above, can you do:
data = {}
for (key, group), ax in zip(grouped, axes):
...
data[key] = d
ret = pd.Series(data)
In general one should not construct a Series row-by-row (this is very slow), but rather provide all the data upfront. I think this will also allow tuples for key.
) | ||
df["X"] = Series(np.repeat(["A", "B"], int(rows / 2))) | ||
df["Y"] = Series(np.tile(["C", "D"], int(rows / 2))) | ||
pregrouped = df.groupby(group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Just to adhere to existing conventions in tests, could you either call this grouped
or gb
. These are the names used in other tests here.
doc/source/whatsnew/v3.0.0.rst
Outdated
@@ -375,7 +375,7 @@ Period | |||
|
|||
Plotting | |||
^^^^^^^^ | |||
- | |||
- Bug in :meth:`DataFrameGroupBy.boxplot` passes a ``tuple`` instead of a ``string`` when input ``DataFrame`` is pre-grouped using more than one ``column`` (:issue:`14701`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to the API docs won't resolve unless you include a .
before DataFrameGroupBy
; so it should be :meth:`.DataFrameGroupBy.boxplot`
Thanks for the feedback. I will make the necessary adjustments and update the PR accordingly. |
# GH 14701 refactored to allow the 'key' to be passed as a tuple, | ||
# which occurs when there is more than one group | ||
data[key] = d | ||
ret = pd.Series(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to be creating ret
on every loop, only after the for loop is done. This line can be dedented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem.
# GH 14701 refactored to allow the 'key' to be passed as a tuple, | ||
# which occurs when there is more than one group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment should be removed.
doc/source/whatsnew/v3.0.0.rst
Outdated
@@ -409,7 +409,7 @@ Period | |||
|
|||
Plotting | |||
^^^^^^^^ | |||
- | |||
- Bug in :meth:`.DataFrameGroupBy.boxplot` passes a ``tuple`` instead of a ``string`` when input ``DataFrame`` is pre-grouped using more than one ``column`` (:issue:`14701`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer accurate. I'd recommend something more generic, such as
Bug in
:meth:`.DataFrameGroupBy.boxplot`
failed when there were multiple groupings (:issue:14701
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @thetestspecimen - very nice! |
No Problem @rhshadrach. Thanks for the detailed input and guidance. |
…h more than one grouping (pandas-dev#57985)
doc/source/whatsnew/v3.0.0.rst
file if fixing a bug or adding a new feature.The original bug shows an indexing error when using
boxplot
on a pre-groupedDataFrame
. For example:df.groupby(['X', 'Y']).boxplot()
This error occurs as multiple groupings are passed as a tuple into the index of a
Series
, which is not allowed. The fix converts thetuple
to astring
, which is acceptable as an index to aSeries
.There are other comments on the bug thread that state "Groupby boxplot is not working even when a single 'by' is used", which is not the case.
I have included in the tests both a single
by
and multipleby
covering three different methods of plotting aboxplot
from aDataFrame
:df.boxplot(by='X')
df.boxplot(by=['X','Y'])
df.plot.box(by='X')
df.plot.box(by=['X','Y'])
df.groupby('X').boxplot()
df.groupby(['X','Y']).boxplot()
Of all of the above, only example 6 required the fix.