Skip to content

DOC: update the pandas.DataFrame.plot.area docstring #20170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 10, 2018

Conversation

DZPM
Copy link
Contributor

@DZPM DZPM commented Mar 10, 2018

  • PR title is "DOC: update the pandas.DataFrame.plot.area docstring"
  • The validation script passes: scripts/validate_docstrings.py pandas.DataFrame.plot.area
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single pandas.DataFrame.plot.area
  • It has been proofread on language by another sprint participant
################################################################################
#################### Docstring (pandas.DataFrame.plot.area) ####################
################################################################################

Draw an area plot of the input series using matplotlib.

An area plot displays graphically quantitative data.
This function wraps the matplotlib area function.

Parameters
----------
x : label or position, optional
    Coordinates for X point.
y : label or position, optional
    Coordinates for Y point.
kwds : optional
    Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.

Returns
-------
matplotlib.AxesSubplot or numpy.ndarray
    Area plot generated.

See Also
--------
pandas.DataFrame.plot : Draw a plot using matplotlib.

Examples
--------
.. plot::
    :context: close-figs

    This example draws an area plot based on the length and width of
    some animals, displayed in three bins

    >>> df = pd.DataFrame({
    ...     'sales': [3, 2, 3, 9, 10, 6],
    ...     'visits': [20, 42, 28, 62, 81, 50],
    ... })
    >>> area = df.plot.area()

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.plot.area" correct. :)

Area plot
Draw an area plot of the input series using matplotlib.

An area plot displays graphically quantitative data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

displays quantitative data visually

Coordinates for each point.
`**kwds` : optional
x : label or position, optional
Coordinates for X point.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X axis

x : label or position, optional
Coordinates for X point.
y : label or position, optional
Coordinates for Y point.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y axis

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! added some comments


See Also
--------
pandas.DataFrame.plot : Draw a plot using matplotlib.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation here is a bit strange, as it would seem to say that this one is not plotting with matplotlib

Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : matplotlib.AxesSubplot or np.array of them
matplotlib.AxesSubplot or numpy.ndarray
Area plot generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain something about when ndarray is returned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. In this case the existing explanation Area plot generated doesn't tell a lot. How is a numpy.ndarray the area plot generated?

@@ -2818,18 +2818,42 @@ def kde(self, bw_method=None, ind=None, **kwds):

def area(self, x=None, y=None, **kwds):
"""
Area plot
Draw an area plot of the input series using matplotlib.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"input series" -> input is a bit strange wording as it is the calling object (not passed as input) + it should be DataFrame instead of Series ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we want to mention the underlying library this is using at this level. Does the user really care so much? In the extended description is more OK IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as @jorisvandenbossche says, the description doesn't make sense for dataframes. Probably what the function does is to display a stacked area plot?

@dukebody
Copy link
Contributor

Can you post a screenshot of the rendered HTML? It will help reviewing without having to check out your branch and compiling it.

Copy link
Contributor

@dukebody dukebody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this needs some better explanations of the extended description, arguments and more examples that help to understand how this graph works.

@@ -2818,18 +2818,42 @@ def kde(self, bw_method=None, ind=None, **kwds):

def area(self, x=None, y=None, **kwds):
"""
Area plot
Draw an area plot of the input series using matplotlib.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we want to mention the underlying library this is using at this level. Does the user really care so much? In the extended description is more OK IMO.

@@ -2818,18 +2818,42 @@ def kde(self, bw_method=None, ind=None, **kwds):

def area(self, x=None, y=None, **kwds):
"""
Area plot
Draw an area plot of the input series using matplotlib.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as @jorisvandenbossche says, the description doesn't make sense for dataframes. Probably what the function does is to display a stacked area plot?

Area plot
Draw an area plot of the input series using matplotlib.

An area plot displays graphically quantitative data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add some info about why this function is useful or some basic use cases. Displays quantitative data visually doesn't really give me very interesting information about what is the shape or usage of the plot.

x, y : label or position, optional
Coordinates for each point.
`**kwds` : optional
x : label or position, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should follow the guide: https://python-sprints.github.io/pandas/guide/pandas_docstring.html#section-3-parameters

So it should indicate the type of the parameter in the first line (plus optional), and later the meaning of the parameter.

Coordinates for each point.
`**kwds` : optional
x : label or position, optional
Coordinates for X point.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this description of the arguments x and y I really don't understand what they change in the plot. Can we make the parameter description a bit longer explaining what do they mean?

Coordinates for X point.
y : label or position, optional
Coordinates for Y point.
kwds : optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in other PRs, you should use **kwds instead. Sorry for the great big confusion here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using **kwds raises errors in the current version of the validator:

Errors in parameters section
	Parameters {'kwds'} not documented
	Unknown parameters {'**kwds'}

I think it's important to be consistent between the validator and the code.

Of course, once the validator is updated, I agree this must be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can ignore that error in the validation script for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updated!

Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : matplotlib.AxesSubplot or np.array of them
matplotlib.AxesSubplot or numpy.ndarray
Area plot generated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. In this case the existing explanation Area plot generated doesn't tell a lot. How is a numpy.ndarray the area plot generated?

.. plot::
:context: close-figs

This example draws an area plot based on the length and width of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think is better to put the text before the start of the plot block, and leave the block only for the code that constructs the plot.
  2. The example this sentence talks about doesn't correspond to the dataframe below.
  3. Don't write in 3rd person This example does this. Better write directly in imperative or present form, like Draw an area plot based on or We can draw an area plot based on..., but IMO mentioning this example is somewhat redundant when we are already in the examples section. :)

:context: close-figs

This example draws an area plot based on the length and width of
some animals, displayed in three bins
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In three bins? Where does the three come from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's wrong, copied it from another example. Removed it.

... 'sales': [3, 2, 3, 9, 10, 6],
... 'visits': [20, 42, 28, 62, 81, 50],
... })
>>> area = df.plot.area()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, but if the result of this method is an axis object, perhaps it's better to assign it to a variable named ax like in other examples.

Can you add other examples where the usage of the x and y parameters is shown? Also an example where a numpy.ndarray is returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about how to define a "usable" example, a little help would be appreciated!

@dukebody
Copy link
Contributor

Is there anything we can do to help finishing the work on this PR?

@DZPM
Copy link
Contributor Author

DZPM commented Mar 13, 2018

I'll update the PR later this week, just need some time 👍

@DZPM DZPM force-pushed the docstring_dataframe_plot_area branch from 6a4c50c to 9223d2f Compare March 15, 2018 11:22
@codecov
Copy link

codecov bot commented Mar 15, 2018

Codecov Report

Merging #20170 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #20170   +/-   ##
=======================================
  Coverage   92.07%   92.07%           
=======================================
  Files         169      169           
  Lines       50683    50683           
=======================================
  Hits        46666    46666           
  Misses       4017     4017
Flag Coverage Δ
#multiple 90.48% <ø> (ø) ⬆️
#single 42.34% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_core.py 83.48% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7b408e...3f02fe7. Read the comment docs.

@DZPM
Copy link
Contributor Author

DZPM commented Mar 15, 2018

Fixed most of the suggestions, except for:

  • writing **kwds instead of kwds. That would break the validation.
  • more/better examples: I ask for inspiration :)

Also, a screenshot of the example:
image

@DZPM DZPM force-pushed the docstring_dataframe_plot_area branch from 9223d2f to b59f433 Compare March 15, 2018 12:24
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Added some more comments

@@ -3115,19 +3115,42 @@ def kde(self, bw_method=None, ind=None, **kwds):

def area(self, x=None, y=None, **kwds):
"""
Area plot
Draw an stacked area plot.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an -> a

x, y : label or position, optional
Coordinates for each point.
`**kwds` : optional
x : label or position, optional, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can leave out the ", default None" part (saying it is "optional" should be informative enough").

But we should specify here what it used for x by default. I suppose this is the index?

(and the same comment for y

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe it's the index by default.

Coordinates for X axis.
y : label or position, optional, default None
Coordinates for Y axis.
**kwds : optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also document the stacked keyword? I know it is not in the signature, but it is a relevant one for this type of plot, so I would add it.
See http://pandas.pydata.org/pandas-docs/stable/visualization.html#area-plot

(the validation script will complain about it, but you can ignore that again)

Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them
matplotlib.axes.Axes or numpy.ndarray
Area plot, or array of area plots.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add when an array is returned, which is when subplots=True is specified

... 'sales': [3, 2, 3, 9, 10, 6],
... 'visits': [20, 42, 28, 62, 81, 50],
... })
>>> ax = df.plot.area()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas for the examples:

  • I would also show one where you use the x and y keywords
  • I would show one with stacked=False
  • I think it is typical to use area plots with time series data. So maybe it would be nice to add a DatetimeIndex to the df.

@TomAugspurger
Copy link
Contributor

writing **kwds instead of kwds. That would break the validation.

You can ignore this, **kwargs is correct, the script is incorrect :)

@TomAugspurger
Copy link
Contributor

http://pandas.pydata.org/pandas-docs/stable/visualization.html#area-plot has an example. The same thing with a datetimeidnex from pd.date_range would be nice (maybe a bit wider of an aspect ratio).

@DZPM DZPM force-pushed the docstring_dataframe_plot_area branch from b59f433 to 5077a12 Compare March 26, 2018 20:16
@DZPM
Copy link
Contributor Author

DZPM commented Mar 26, 2018

Hi,

I've updated with all the proposals, and rebased.

Screenshots of the rendered example section:
image
image
image
image
image

It looks much better :) Thanks for the feedback!

@DZPM
Copy link
Contributor Author

DZPM commented Apr 6, 2018

Ping @kynan @jorisvandenbossche @TomAugspurger @dukebody

Feel free to keep reviewing, or do the merge :)

@jorisvandenbossche
Copy link
Member

@DZPM Thanks for the reminder!
Can you add a datetime index to the example frame? See #20170 (comment)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more comments

BTW, for future reference, if you add new commits to the branch with additional changes instead of squashing everything into a single commit, it is a bit easier to see what has been updated since the last review

Coordinates for each point.
`**kwds` : optional
x : label or position, optional
Index, coordinates for X axis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Index, " part is a bit confusing I think. I would reword this as : Coordinates for the Y axis. By default uses the index.

x : label or position, optional
Index, coordinates for X axis.
y : label or position, optional
Index, coordinates for Y axis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the default is not the index, but all columns. So maybe something like "Column to plot. By default uses all columns." ?

... 'visits': [20, 42, 28, 62, 81, 50],
... })
>>> ax = df.plot.area(y='sales')
>>> ax = df.plot.area(y='signups')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think showing only the first is fine to illustrate the y keyword (to save some vertical screen estate)

(if you want to have all of them on a separate figure, one could also use subplots=True)

@datapythonista
Copy link
Member

@DZPM can you please update this PR based on the previous feedback

@datapythonista datapythonista self-assigned this Jul 22, 2018
@DZPM
Copy link
Contributor Author

DZPM commented Jul 23, 2018

Sure, will do. (Sorry, I've been busy and the PR slipped my mind...)

@DZPM DZPM force-pushed the docstring_dataframe_plot_area branch from 5077a12 to 86db52e Compare August 1, 2018 13:40
@DZPM
Copy link
Contributor Author

DZPM commented Aug 1, 2018

Done. Added the changes in a new commit, and rebased.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just few minor things. Thanks for the changes!

... 'signups': [5, 5, 6, 12, 14, 13],
... 'visits': [20, 42, 28, 62, 81, 50],
... }, index=pd.date_range(start='2018/01/01', end='2018/07/01',
... freq='M'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the .. plot:: directive resetting the namespace? Otherwise, It'd be good to reuse the DataFrame instead of repeating it in every example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some issues with the namespace, so I kept it in different places.

Don't have the full environment right now, can't check if it also works the other way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recreating the dataframe each time should not be needed


See Also
--------
pandas.DataFrame.plot : Draw plots.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of pandas. and just leave DataFrame.plot. Something a bit more descriptive than "Draw plots." could be useful.

Not sure if in similar docstrings we also added the matplotlib equivalent method. Can you check the docstring for DataFrame.kde for example, and be consistent with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the DataFrame.plot description.

`**kwds` : optional
x : label or position, optional
Coordinates for the X axis. By default uses the index.
y : label or position, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @dukebody says, for both x and y, in the first line we should have the type, more than a short description. It's a bit tricky, as label can be anything, but I think we're using something like object or int, optional. Then, we can sure clarify that it's the label or the position in the description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it consistent with DataFrame.plot, which is the more complete docstring example I could find.

Coordinates for the X axis. By default uses the index.
y : label or position, optional
Column to plot. By default uses all columns.
stacked : boolean, default True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to leave like this now, but as stacked is not in the signature, may be we should add it there, or document it under kwds. But if we leave it like this, can you replace boolean by the Python type bool please?

@DZPM DZPM force-pushed the docstring_dataframe_plot_area branch from 86db52e to 3f02fe7 Compare August 8, 2018 16:06
@DZPM
Copy link
Contributor Author

DZPM commented Aug 8, 2018

Changes done, and rebased. Hope it's ok!

@jreback jreback added this to the 0.24.0 milestone Aug 9, 2018
@TomAugspurger
Copy link
Contributor

LGTM, thanks!

If you want, check the doc build when it finishes (probably a few hours).

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some follow-up in #22307

... 'signups': [5, 5, 6, 12, 14, 13],
... 'visits': [20, 42, 28, 62, 81, 50],
... }, index=pd.date_range(start='2018/01/01', end='2018/07/01',
... freq='M'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recreating the dataframe each time should not be needed

... 'visits': [20, 42, 28],
... 'day': ['Monday', 'Tuesday', 'Wednesday'],
... }, index=pd.date_range(start='2018/01/01', end='2018/07/01',
... freq='M'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is not running (the index is not correct). And also the labels to not show on the actual plot once this is fixed.

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants