-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: Reformat output of groupby.describe (#4792) #15260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/tests/groupby/test_groupby.py
Outdated
'VOLUME': volumes}) | ||
result = df.groupby('PRICE').describe() | ||
expected_index = pd.MultiIndex(levels=[[24990, 25499], | ||
['count', 'mean', 'std', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of constructing this way
use concat with keys on the subframes
Actually I think the bug is still present. The issue that you want this
rather than this
we do a similar unstack already with
So each group gets a single row, while multi-columns are present for multiple aggregations. |
Ah that makes sense. Thanks for clarifying the expected output! I've been poking into the code, and since each group goes through
I could add some logic saying if the indexes for each group are the same, concat on the index and transpose to the columns, but I think that'd be a pretty big change since it will probably affect all |
so you do just need to define Right now we automagically just do any whitelisted function (including .describe) and they go thru the standard reshaping things. If you can figure out a generic way to do this great, but otherwise defining a method is fine. |
Cool, thanks for the insight. I defined a new method for |
doc/source/whatsnew/v0.20.0.txt
Outdated
@@ -366,6 +366,7 @@ Other API Changes | |||
- ``inplace`` arguments now require a boolean value, else a ``ValueError`` is thrown (:issue:`14189`) | |||
- ``pandas.api.types.is_datetime64_ns_dtype`` will now report ``True`` on a tz-aware dtype, similar to ``pandas.api.types.is_datetime64_any_dtype`` | |||
- ``DataFrame.asof()`` will return a null filled ``Series`` instead the scalar ``NaN`` if a match is not found (:issue:`15118`) | |||
- ``groupby.describe()`` now labels the `describe()` metrics in the column instead of the index (:issue:`4792`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think move this to a sub-section and show previous and current behavior.
pandas/core/groupby.py
Outdated
@@ -1140,6 +1139,17 @@ def ohlc(self): | |||
|
|||
@Substitution(name='groupby') | |||
@Appender(_doc_template) | |||
def describe(self, **kwargs): | |||
""" | |||
Provide summary statistics for each group, excluding NaN values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a full doc-string, with named arguments. You might be able to simply add Series.describe and DataFrame.describe in the Notes section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or as you do below just use the DataFrame doc-string
4b5d367
to
4375710
Compare
Added previous/current behavior in whatsnew, documentation to describe with |
doc/source/whatsnew/v0.20.0.txt
Outdated
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The output formatting of ``groupby.describe()`` now labels the ``describe()`` metrics in the columns instead of the index. | ||
This format is consistent with ``groupby.ohlc()`` (:issue:`4792`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more to the point its consistent with how .agg()
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most people prob don't know about .ohlc()
:>
doc/source/whatsnew/v0.20.0.txt
Outdated
|
||
New Behavior: | ||
|
||
.. code-block:: ipython |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use an ipython-block here (so the code executes)
pandas/tests/groupby/test_groupby.py
Outdated
expected.index.names = ['A', None] | ||
expected = pd.concat([(df[df.A == 1].B | ||
.describe() | ||
.to_frame() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like lots of reshapings (this is current master)
In [6]: df
Out[6]:
A B C
0 1 2.0 foo
1 1 NaN bar
2 3 NaN baz
In [7]: df.groupby('A').describe()
Out[7]:
B
A
1 count 1.0
mean 2.0
std NaN
min 2.0
25% 2.0
50% 2.0
75% 2.0
max 2.0
3 count 0.0
mean NaN
std NaN
min NaN
25% NaN
50% NaN
75% NaN
max NaN
In [8]: df.groupby('A').describe().unstack()
Out[8]:
B
count mean std min 25% 50% 75% max
A
1 1.0 2.0 NaN 2.0 2.0 2.0 2.0 2.0
3 0.0 NaN NaN NaN NaN NaN NaN NaN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test, the result = df.groupby('A').describe().unstack()
after unstack()
was added to groupby.describe()
. Shouldn't expected
follow an independent path to the result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but you can simply directly construct this result (as its 'simple' enough), just pd.DataFrame(.....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see thanks! Agreed that my it my first edit probably included too much reshaping.
Replaced API example using |
hmm, something is wrong here. Its including the grouper.
|
It looks like the grouper is included when using
Can be seen with other functions as well:
Is this a known issue? |
@mroeschke so by-definition this is what apply does. you can use |
Ah thanks for clarifying that @jreback. Will edit to use |
Restructure describe def Fix another test Refactoring tests linting & patch groupby tests add whatsnew fix docstring fix more tests Added api example and documentation to describe fix potential pep8 complaint adjust doc description renamed original test and add agg example in doc simplify example Eliminate grouper from result simplify example in the whatsnew
Codecov Report
@@ Coverage Diff @@
## master #15260 +/- ##
=========================================
Coverage ? 86.32%
=========================================
Files ? 141
Lines ? 51177
Branches ? 0
=========================================
Hits ? 44180
Misses ? 6997
Partials ? 0
Continue to review full report at Codecov.
|
Added |
thanks! keep em coming! |
closes pandas-dev#4792 Author: Matt Roeschke <[email protected]> Author: Matthew Roeschke <[email protected]> Closes pandas-dev#15260 from mroeschke/fix_4792 and squashes the following commits: 618bc46 [Matthew Roeschke] Merge branch 'master' into fix_4792 184378d [Matt Roeschke] TST: groupby.describe levels don't appear as column (pandas-dev#4792)
git diff upstream/master | flake8 --diff
Doesn't look like this was address in a PR in 0.20, but the original issue works on master.