-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update the pandas.DataFrame.plot.area docstring #20170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update the pandas.DataFrame.plot.area docstring #20170
Conversation
pandas/plotting/_core.py
Outdated
Area plot | ||
Draw an area plot of the input series using matplotlib. | ||
|
||
An area plot displays graphically quantitative data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
displays quantitative data visually
pandas/plotting/_core.py
Outdated
Coordinates for each point. | ||
`**kwds` : optional | ||
x : label or position, optional | ||
Coordinates for X point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X axis
pandas/plotting/_core.py
Outdated
x : label or position, optional | ||
Coordinates for X point. | ||
y : label or position, optional | ||
Coordinates for Y point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Y axis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! added some comments
pandas/plotting/_core.py
Outdated
|
||
See Also | ||
-------- | ||
pandas.DataFrame.plot : Draw a plot using matplotlib. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation here is a bit strange, as it would seem to say that this one is not plotting with matplotlib
pandas/plotting/_core.py
Outdated
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
axes : matplotlib.AxesSubplot or np.array of them | ||
matplotlib.AxesSubplot or numpy.ndarray | ||
Area plot generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain something about when ndarray is returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. In this case the existing explanation Area plot generated
doesn't tell a lot. How is a numpy.ndarray
the area plot generated?
pandas/plotting/_core.py
Outdated
@@ -2818,18 +2818,42 @@ def kde(self, bw_method=None, ind=None, **kwds): | |||
|
|||
def area(self, x=None, y=None, **kwds): | |||
""" | |||
Area plot | |||
Draw an area plot of the input series using matplotlib. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"input series" -> input is a bit strange wording as it is the calling object (not passed as input) + it should be DataFrame instead of Series ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we want to mention the underlying library this is using at this level. Does the user really care so much? In the extended description is more OK IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as @jorisvandenbossche says, the description doesn't make sense for dataframes. Probably what the function does is to display a stacked area plot?
Can you post a screenshot of the rendered HTML? It will help reviewing without having to check out your branch and compiling it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this needs some better explanations of the extended description, arguments and more examples that help to understand how this graph works.
pandas/plotting/_core.py
Outdated
@@ -2818,18 +2818,42 @@ def kde(self, bw_method=None, ind=None, **kwds): | |||
|
|||
def area(self, x=None, y=None, **kwds): | |||
""" | |||
Area plot | |||
Draw an area plot of the input series using matplotlib. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we want to mention the underlying library this is using at this level. Does the user really care so much? In the extended description is more OK IMO.
pandas/plotting/_core.py
Outdated
@@ -2818,18 +2818,42 @@ def kde(self, bw_method=None, ind=None, **kwds): | |||
|
|||
def area(self, x=None, y=None, **kwds): | |||
""" | |||
Area plot | |||
Draw an area plot of the input series using matplotlib. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as @jorisvandenbossche says, the description doesn't make sense for dataframes. Probably what the function does is to display a stacked area plot?
pandas/plotting/_core.py
Outdated
Area plot | ||
Draw an area plot of the input series using matplotlib. | ||
|
||
An area plot displays graphically quantitative data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to add some info about why this function is useful or some basic use cases. Displays quantitative data visually
doesn't really give me very interesting information about what is the shape or usage of the plot.
x, y : label or position, optional | ||
Coordinates for each point. | ||
`**kwds` : optional | ||
x : label or position, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should follow the guide: https://python-sprints.github.io/pandas/guide/pandas_docstring.html#section-3-parameters
So it should indicate the type of the parameter in the first line (plus optional), and later the meaning of the parameter.
pandas/plotting/_core.py
Outdated
Coordinates for each point. | ||
`**kwds` : optional | ||
x : label or position, optional | ||
Coordinates for X point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this description of the arguments x
and y
I really don't understand what they change in the plot. Can we make the parameter description a bit longer explaining what do they mean?
pandas/plotting/_core.py
Outdated
Coordinates for X point. | ||
y : label or position, optional | ||
Coordinates for Y point. | ||
kwds : optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in other PRs, you should use **kwds
instead. Sorry for the great big confusion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using **kwds
raises errors in the current version of the validator:
Errors in parameters section
Parameters {'kwds'} not documented
Unknown parameters {'**kwds'}
I think it's important to be consistent between the validator and the code.
Of course, once the validator is updated, I agree this must be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can ignore that error in the validation script for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, updated!
pandas/plotting/_core.py
Outdated
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
axes : matplotlib.AxesSubplot or np.array of them | ||
matplotlib.AxesSubplot or numpy.ndarray | ||
Area plot generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. In this case the existing explanation Area plot generated
doesn't tell a lot. How is a numpy.ndarray
the area plot generated?
pandas/plotting/_core.py
Outdated
.. plot:: | ||
:context: close-figs | ||
|
||
This example draws an area plot based on the length and width of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think is better to put the text before the start of the
plot
block, and leave the block only for the code that constructs the plot. - The example this sentence talks about doesn't correspond to the dataframe below.
- Don't write in 3rd person
This example does this
. Better write directly in imperative or present form, likeDraw an area plot based on
orWe can draw an area plot based on...
, but IMO mentioningthis example
is somewhat redundant when we are already in the examples section. :)
pandas/plotting/_core.py
Outdated
:context: close-figs | ||
|
||
This example draws an area plot based on the length and width of | ||
some animals, displayed in three bins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In three bins? Where does the three come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's wrong, copied it from another example. Removed it.
pandas/plotting/_core.py
Outdated
... 'sales': [3, 2, 3, 9, 10, 6], | ||
... 'visits': [20, 42, 28, 62, 81, 50], | ||
... }) | ||
>>> area = df.plot.area() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this, but if the result of this method is an axis object, perhaps it's better to assign it to a variable named ax
like in other examples.
Can you add other examples where the usage of the x
and y
parameters is shown? Also an example where a numpy.ndarray
is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about how to define a "usable" example, a little help would be appreciated!
Is there anything we can do to help finishing the work on this PR? |
I'll update the PR later this week, just need some time 👍 |
6a4c50c
to
9223d2f
Compare
Codecov Report
@@ Coverage Diff @@
## master #20170 +/- ##
=======================================
Coverage 92.07% 92.07%
=======================================
Files 169 169
Lines 50683 50683
=======================================
Hits 46666 46666
Misses 4017 4017
Continue to review full report at Codecov.
|
9223d2f
to
b59f433
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! Added some more comments
pandas/plotting/_core.py
Outdated
@@ -3115,19 +3115,42 @@ def kde(self, bw_method=None, ind=None, **kwds): | |||
|
|||
def area(self, x=None, y=None, **kwds): | |||
""" | |||
Area plot | |||
Draw an stacked area plot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an -> a
pandas/plotting/_core.py
Outdated
x, y : label or position, optional | ||
Coordinates for each point. | ||
`**kwds` : optional | ||
x : label or position, optional, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can leave out the ", default None" part (saying it is "optional" should be informative enough").
But we should specify here what it used for x
by default. I suppose this is the index?
(and the same comment for y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I believe it's the index by default.
pandas/plotting/_core.py
Outdated
Coordinates for X axis. | ||
y : label or position, optional, default None | ||
Coordinates for Y axis. | ||
**kwds : optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also document the stacked
keyword? I know it is not in the signature, but it is a relevant one for this type of plot, so I would add it.
See http://pandas.pydata.org/pandas-docs/stable/visualization.html#area-plot
(the validation script will complain about it, but you can ignore that again)
pandas/plotting/_core.py
Outdated
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them | ||
matplotlib.axes.Axes or numpy.ndarray | ||
Area plot, or array of area plots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add when an array is returned, which is when subplots=True
is specified
pandas/plotting/_core.py
Outdated
... 'sales': [3, 2, 3, 9, 10, 6], | ||
... 'visits': [20, 42, 28, 62, 81, 50], | ||
... }) | ||
>>> ax = df.plot.area() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some ideas for the examples:
- I would also show one where you use the
x
andy
keywords - I would show one with
stacked=False
- I think it is typical to use area plots with time series data. So maybe it would be nice to add a DatetimeIndex to the
df
.
You can ignore this, |
http://pandas.pydata.org/pandas-docs/stable/visualization.html#area-plot has an example. The same thing with a datetimeidnex from |
b59f433
to
5077a12
Compare
Ping @kynan @jorisvandenbossche @TomAugspurger @dukebody Feel free to keep reviewing, or do the merge :) |
@DZPM Thanks for the reminder! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few more comments
BTW, for future reference, if you add new commits to the branch with additional changes instead of squashing everything into a single commit, it is a bit easier to see what has been updated since the last review
pandas/plotting/_core.py
Outdated
Coordinates for each point. | ||
`**kwds` : optional | ||
x : label or position, optional | ||
Index, coordinates for X axis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "Index, " part is a bit confusing I think. I would reword this as : Coordinates for the Y axis. By default uses the index.
pandas/plotting/_core.py
Outdated
x : label or position, optional | ||
Index, coordinates for X axis. | ||
y : label or position, optional | ||
Index, coordinates for Y axis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the default is not the index, but all columns. So maybe something like "Column to plot. By default uses all columns." ?
pandas/plotting/_core.py
Outdated
... 'visits': [20, 42, 28, 62, 81, 50], | ||
... }) | ||
>>> ax = df.plot.area(y='sales') | ||
>>> ax = df.plot.area(y='signups') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think showing only the first is fine to illustrate the y
keyword (to save some vertical screen estate)
(if you want to have all of them on a separate figure, one could also use subplots=True
)
@DZPM can you please update this PR based on the previous feedback |
Sure, will do. (Sorry, I've been busy and the PR slipped my mind...) |
5077a12
to
86db52e
Compare
Done. Added the changes in a new commit, and rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just few minor things. Thanks for the changes!
... 'signups': [5, 5, 6, 12, 14, 13], | ||
... 'visits': [20, 42, 28, 62, 81, 50], | ||
... }, index=pd.date_range(start='2018/01/01', end='2018/07/01', | ||
... freq='M')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the .. plot::
directive resetting the namespace? Otherwise, It'd be good to reuse the DataFrame instead of repeating it in every example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had some issues with the namespace, so I kept it in different places.
Don't have the full environment right now, can't check if it also works the other way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recreating the dataframe each time should not be needed
pandas/plotting/_core.py
Outdated
|
||
See Also | ||
-------- | ||
pandas.DataFrame.plot : Draw plots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of pandas.
and just leave DataFrame.plot
. Something a bit more descriptive than "Draw plots." could be useful.
Not sure if in similar docstrings we also added the matplotlib equivalent method. Can you check the docstring for DataFrame.kde
for example, and be consistent with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the DataFrame.plot
description.
`**kwds` : optional | ||
x : label or position, optional | ||
Coordinates for the X axis. By default uses the index. | ||
y : label or position, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @dukebody says, for both x
and y
, in the first line we should have the type, more than a short description. It's a bit tricky, as label
can be anything, but I think we're using something like object or int, optional
. Then, we can sure clarify that it's the label
or the position
in the description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it consistent with DataFrame.plot
, which is the more complete docstring example I could find.
pandas/plotting/_core.py
Outdated
Coordinates for the X axis. By default uses the index. | ||
y : label or position, optional | ||
Column to plot. By default uses all columns. | ||
stacked : boolean, default True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to leave like this now, but as stacked
is not in the signature, may be we should add it there, or document it under kwds
. But if we leave it like this, can you replace boolean
by the Python type bool
please?
86db52e
to
3f02fe7
Compare
Changes done, and rebased. Hope it's ok! |
LGTM, thanks! If you want, check the doc build when it finishes (probably a few hours). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some follow-up in #22307
... 'signups': [5, 5, 6, 12, 14, 13], | ||
... 'visits': [20, 42, 28, 62, 81, 50], | ||
... }, index=pd.date_range(start='2018/01/01', end='2018/07/01', | ||
... freq='M')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recreating the dataframe each time should not be needed
... 'visits': [20, 42, 28], | ||
... 'day': ['Monday', 'Tuesday', 'Wednesday'], | ||
... }, index=pd.date_range(start='2018/01/01', end='2018/07/01', | ||
... freq='M')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is not running (the index is not correct). And also the labels to not show on the actual plot once this is fixed.
scripts/validate_docstrings.py pandas.DataFrame.plot.area
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single pandas.DataFrame.plot.area