-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: DataFrame.describe
allows UDFs and/or selectable metrics
#45737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What about an |
What's the difference between this and agg?
|
That is an excellent point. I'm going to blame the advertising, :). I would recommend a link on the docs for |
Agreed on the docs. My use case for describe was always some quick statistics on a data set and if I needed to do more I turn to agg. In other words, to me at least, the purpose of describe is to be quick, not flexible. |
this was actually the original usecase of |
Really it's a good example of just how large the api is. I can understand the core teams hesitation with adding anything more |
DataFrame.describe()
generates descriptive statistics for columns in a DataFrame. The "descriptive statistics" have been specifically chosen and hard-coded, and are also somewhat dtype dependent.I find it quite odd that the function has a lot of customisation for which columns to
include
or toexclude
based on dtype, or whichpercentiles
to sample, but doesn't offer the ability to chain a set of predefined functions that the user might want to see (or not see: such as percentiles).A rough idea is to propose a new argument, e.g.
metrics
, which overwrites and defines the metrics to a specifc set of functions, defined as str or callable:Note this came up in the context of trying to add sub-total or additional rows to a Styler, based on the underlying data (#43894), and following issues:
The text was updated successfully, but these errors were encountered: