-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: improve groupby reference docs #6944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
👍 nice list |
Question: Are all of the items on the
I ask because I'm proposing to define the whitelisted methods at class definition time, instead of relying on the But that often overrides the explicit method definitions listed above. And sometimes they don't even do the same thing. Which makes me suspect that these dozen or so whitelisted methods no longer really should be handled in this way (i.e. they already have explicit class definitions that take priority.) So, can anyone think of a reason why we shouldn't just remove these names from the |
by definition the cythonized functions (and a few others which sometimes be cythonized) are defined inline on the groupby objects (eg min,mean,sum etc) you can define the whitelist methods in the classes (and just have the getattr raise an error I suppose) prod is prob an error that it is not defined |
I think this will be easier with Python3 since we can rewrite function signatures without much hassle (something like pandas/pandas/util/_decorators.py Line 128 in 08a66e1
**kwargs .
|
CAN I HELP? my name is Lucas. I have a PhD in development economics and "greener" python enthusiast. Unfortunately, I don't have the time just right now to look at @jorisvandenbossche comments with the care that they deserve, but I did want to make my own remark on something I noticed that probably needs a small adjustment (I'm also sorry for not having had time to read the other posts, which i intend to do latter.). At Pandas' website page "pandas.core.groupby.GroupBy.agg", which seems to mee to be one of great importance to people starting to wrap their minds around python's way of referencing and slicing different ranges, there is no mention about how the code works. By glancing over the source code for the funcion, i noticed 'agg' is short for 'aggregate'. But, since people over the web seems to be using the short version of the method, wouldn't it be nice to have a redirectioning link for new students? Moreover, i also noticed that the following section of the website, titled "pandas.core.groupby.SeriesGroupBy.aggregate", contains info that seems to be relevant to the previous section. This seems to have some connection with the third issue mentioned by , which is: "put all relevant docstrings (...) eg now the aggregate (...) docstrings of GroupBy are empty, but are more elaborate in the subclasses)". Now, I still am not very good at coding, but, as a former college professor in Brazil, I can say with relative confidence that i am good at reading what most people consider "boring" stuff, and at finding small things others don't give attention to. That being said, I would like to know how can i help on this issue? Thank you. Have a nice day! |
It appears most of the issues in the original post are addressed so closing. If there are specific groupby doc issues, it would be better to open specific issues |
An overview of the reference doc on groupby is given here: http://pandas.pydata.org/pandas-docs/dev/api.html#groupby (apart from the extensive user guide: http://pandas.pydata.org/pandas-docs/dev/groupby.html)
There are some things that could use some improvement:
first
/last
/nth
count
,cumcount
, ..name
: not sure what the purpose of this isGroupBy
object itself to the api docs (and so automatically all its methods) (DOC: SeriesGroupby/DataFrameGroupBy is missing class documentaion from doc index #19302)aggregate
andtransform
docstrings of GroupBy are empty, but are more elaborate in the subclasses) (Groupbydocs #8231)apply
docstring is not very clear to meby
arg (and provide some short examples in the 'Examples' section)_apply_whitelist
g = df.groupby(...); g.count?
is returning<no docstring>
(see Docs for lurking groupby methods #4500 (comment) for explanation how)head/tail/nth
are basicallyfilter
type of functions,fillna/shift
are transformers, while almost everything else is a reducer (e.g.sum/mean/describe
), whileapply/agg
can be any of the above. hmm. maybe needs a separate section for this. (and of courseas_index
just makes this crazy)If someone wants to tackle this (or parts of this), go ahead!
The text was updated successfully, but these errors were encountered: