Skip to content

DOC: updated the pandas.DataFrame.plot.hexbin docstring #20121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 14, 2018

Conversation

BielStela
Copy link
Contributor

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:


################################################################################
################### Docstring (pandas.DataFrame.plot.hexbin) ###################
################################################################################

Make hexagonal binning plots.

Make an hexagonal binning plot of *x* versus *y*, where *x*,
*y* are 1-D sequences of the same length, *N*. If *C* is *None*
(the default), this is an histogram of the number of occurrences
of the observations at (x[i],y[i]).

If *C* is specified, specifies values at given coordinates
(x[i],y[i]). These values are accumulated for each hexagonal
bin and then reduced according to *reduce_C_function*, having as default
the numpy's mean function (np.mean). (If *C* is
specified, it must also be a 1-D sequence of the same length
as *x* and *y*.)

Parameters
----------
x : label or position, optional
    Coordinates for x point.
y : label or position, optional
    Coordinates for y point.
C : label or position, optional
    The value at each `(x, y)` point.
reduce_C_function : callable, optional, default `mean`
    Function of one argument that reduces all the values in a bin to
    a single number (e.g. `mean`, `max`, `sum`, `std`).
gridsize : int, optional, default 100
    The number of hexagons in the x-direction.
    The corresponding number of hexagons in the y-direction is chosen
    in a way that the hexagons are approximately regular. Alternatively,
    gridsize can be a tuple with two elements specifying the number of
    hexagons in the x-direction and the y-direction.
kwds : optional
    Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : matplotlib.AxesSubplot or np.array of them.

See Also
--------
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib.

Examples
--------

.. plot::
    :context: close-figs

    >>> from  sklearn.datasets import load_iris
    >>> iris = load_iris()
    >>> df = pd.DataFrame(iris.data, columns=iris.feature_names)
    >>> hexbin = df.plot.hexbin(x='sepal length (cm)', y='sepal width (cm)',
    ...                         gridsize=10, cmap='viridis')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.plot.hexbin" correct. :)


thanks to @Renton2017

@pep8speaks
Copy link

pep8speaks commented Mar 10, 2018

Hello @BielStela! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 14, 2018 at 12:39 Hours UTC

Hexbin plot
Make hexagonal binning plots.

Make an hexagonal binning plot of *x* versus *y*, where *x*,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an -> a.

Parameter names should be in backticks, so

`x` versus `y`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the topic related to h depends. You say An herb garden

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually depends whether the h is silent or not. In case of hexagon I ask if it sound like herbal or like history. In case of history you say always "a history"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for your reply.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward for your reply Tom, since I might be wrong.

@BielStela BielStela force-pushed the docstring_hexbin branch 3 times, most recently from 979b16d to db52245 Compare March 10, 2018 13:15

Make an hexagonal binning plot of `x` versus `y`, where `x`,
`y` are 1-D sequences of the same length, `N`. If `C` is `None`
(the default), this is an histogram of the number of occurrences
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. if you can find a referene on wikipedia might be nice to link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid there is no reference in wikipedia to hexagonal binning. The closed topic I found in wiki is the data binning article. Maybe it's a little bit to generic to include it in the hexbin docstring since it is also suitable for histogram and histogram2d

@jreback jreback added this to the 0.23.0 milestone Mar 10, 2018
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Added some comments

Make hexagonal binning plots.

Make an hexagonal binning plot of `x` versus `y`, where `x`,
`y` are 1-D sequences of the same length, `N`. If `C` is `None`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove " where x, y are 1-D sequences of the same length" since they are references to columns

having as default
the numpy's mean function (np.mean). (If *C* is
specified, it must also be a 1-D sequence of the same length
as `x` and `y`.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reflow the text a bit so that each line is just less than 79 chars ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : matplotlib.AxesSubplot or np.array of them
axes : matplotlib.AxesSubplot or np.array of them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indicate here when an array is returned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it ever is an ndarray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the pandas wrapper can only get matplotlib.AxesSubplot ( I'm struggling to find a way to get np.array as return) , so if it's ok I'll omit this since it looks like it is a behavior of the original matplotlib function.


See Also
--------
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say something like "the matplotlib function that is used under the hood" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

:context: close-figs

>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully sure if we want to rely on sklearn for the example data. In this case, I think it is actually fine to generate some random data with np.random.randn(..), as that will change the generated plot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pls don't use sklearn imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if there is no problem with using np.random.randn() I'll change the example to one similar to matplotlib's documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you can use random data in this case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the docstrings guide to explain when random data might be OK in examples, and also explain how to set a random seed to avoid changing examples. :)

@codecov
Copy link

codecov bot commented Mar 11, 2018

Codecov Report

Merging #20121 into master will decrease coverage by 0.07%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20121      +/-   ##
==========================================
- Coverage   91.77%    91.7%   -0.08%     
==========================================
  Files         152      150       -2     
  Lines       49185    49152      -33     
==========================================
- Hits        45140    45074      -66     
- Misses       4045     4078      +33
Flag Coverage Δ
#multiple 90.08% <ø> (-0.08%) ⬇️
#single 41.84% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/plotting/_core.py 82.23% <ø> (-0.04%) ⬇️
pandas/plotting/_compat.py 62% <0%> (-28.91%) ⬇️
pandas/core/arrays/base.py 74.35% <0%> (-2.39%) ⬇️
pandas/io/html.py 86.6% <0%> (-2.19%) ⬇️
pandas/io/formats/format.py 96.26% <0%> (-1.99%) ⬇️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️
pandas/core/reshape/melt.py 97.19% <0%> (-0.15%) ⬇️
pandas/compat/__init__.py 57.62% <0%> (-0.12%) ⬇️
pandas/core/indexes/datetimelike.py 96.7% <0%> (-0.02%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cad6dc7...347a012. Read the comment docs.

@@ -2938,26 +2938,59 @@ def scatter(self, x, y, s=None, c=None, **kwds):
def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,
**kwds):
"""
Hexbin plot
Make hexagonal binning plot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd write here also Make an hexagonal binning plot. so it's correct English, but a bit nit-pick.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, something else, in the other PRs I think they used "Generate" instead of "Make" ? I don't really care which ones, but maybe best to be consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, Done


Parameters
----------
x, y : label or position, optional
Coordinates for each point.
x : label or position
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the guide we ask people to use types like, int, str here for the parameter types. Should we accept "label or position" as well for types for column selection? This is something very common in many docstrings and we should to an agreement.

cc/ @datapythonista , @jorisvandenbossche

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion I'd use int or str, and use the description to say it's the label or the position. I don't see a reason to make an exception here.I think it's useful for the user to know that it's int or str. And who knows if in the future having the types can be useful in a way similar to mypy. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we might make an exception is because 'str' is too restrictive for labels, as a column labels can in principle be any hashable object ... (but ok, in most cases str will be what people use)

But I am fine here with using int or str

Copy link
Contributor Author

@BielStela BielStela Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot about the type. I have a doubt with reduce_C_function parameter: it only accepts numpy functions in the form np.mean or np.std , not strings like 'mean'. Is function (or func) the correct type reference?

Alternatively,
gridsize can be a tuple with two elements specifying the number
of hexagons in the x-direction and the y-direction.
kwds : optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be **kwds instead of kwds according to our agreement, even if the validation script reports this as a mistake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please, we'll fix the script one of these days. :)

Examples
--------

.. plot::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some plain English text before the example to explain what the example is about?


Make an hexagonal binning plot of `x` versus `y`. If `C` is `None`
(the default), this is an histogram of the number of occurrences
of the observations at (x[i],y[i]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should quote this using double backticks since it's code and not just a variable name:

of the observations at ``(x[i],y[i])``.

But I cannot find this in the guide. @datapythonista?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the backticks really last minute, and the guide wasn't very clear, but I think that's the standard in general, yes.

As it's code, we can also respect PEP-8 and have the space after the comma. :)

If `C` is specified, specifies values at given coordinates
(x[i],y[i]). These values are accumulated for each hexagonal
bin and then reduced according to `reduce_C_function`,
having as default the numpy's mean function (np.mean).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link to the numpy function? I believe:

:meth:`numpy.mean`

should do the job.

The corresponding number of hexagons in the y-direction is
chosen in a way that the hexagons are approximately regular.
Alternatively,
gridsize can be a tuple with two elements specifying the number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should write: gridsize : int or tuple of (int, int), optional ...

See Also
--------
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib,
the matplotlib function that is used under the hood.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the indentation level of this line OK? I thought it should be at the level of the previous one +4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sphinx doesn't care, but let's indeed use '+4' to be consistent

:context: close-figs

>>> n = 100000
>>> df = pd.DataFrame({'x':np.random.randn(n),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For examples using random data I think we should add a seed. Otherwise the example will change everytime we build the documentation, which is a bit weird IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been told that it's OK-ish to use random data in plots without seed if the exact plot result is not important, so nevermind.

>>> n = 100000
>>> df = pd.DataFrame({'x':np.random.randn(n),
... 'y':np.random.randn(n)})
>>> hexbin = df.plot.hexbin(x='x', y='y', cmap='viridis')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an example using C?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm working on one that is self explanatory

@@ -2938,26 +2938,59 @@ def scatter(self, x, y, s=None, c=None, **kwds):
def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,
**kwds):
"""
Hexbin plot
Make hexagonal binning plot.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, something else, in the other PRs I think they used "Generate" instead of "Make" ? I don't really care which ones, but maybe best to be consistent


Parameters
----------
x, y : label or position, optional
Coordinates for each point.
x : label or position
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we might make an exception is because 'str' is too restrictive for labels, as a column labels can in principle be any hashable object ... (but ok, in most cases str will be what people use)

But I am fine here with using int or str

C : label or position, optional
The value at each `(x, y)` point.
reduce_C_function : callable, optional
reduce_C_function : callable, optional, default `mean`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the 'optional' part here

(and the same below)

gridsize can be a tuple with two elements specifying the number
of hexagons in the x-direction and the y-direction.
kwds : optional
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this explanation into "Additional keyword arguments are documented in DataFrame.plot" ?

also, the :py part is not needed

See Also
--------
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib,
the matplotlib function that is used under the hood.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sphinx doesn't care, but let's indeed use '+4' to be consistent

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 12, 2018 via email

@jorisvandenbossche
Copy link
Member

Fixed the PEP8 issue and conflict with master

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 14, 2018

1 sec. Fixing some grammar.

It's a hexagonal :)

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just minor comments

:context: close-figs

>>> n = 100000
>>> # Make a dataframe with normal distributed data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor thing, but I haven't seen a comment in the interpreter in the rest of the documentation. I'd simply add to the previous explanation that the data is normal, and this would be a bit more compact and standard

Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them
axes : matplotlib.AxesSubplot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using in most places just the type, and in the next line a short summary of what is returned.

See Also
--------
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib,
the matplotlib function that is used under the hood.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be we could add DataFrame.plot here, as it's discussed in the parameters? Not sure what the other methods had in See Also.

@jorisvandenbossche
Copy link
Member

@BielStela I made a small edit in the actual plot. The default gridsize seems rather useless, so I added one to the first plot to make it look a bit better. I also removed the cmap there (because the "inferno" has dark low values, it is not really fitting for a hexbin IMO, but added a custom cmap to the second example)

@TomAugspurger
Copy link
Contributor

@jorisvandenbossche you want to fixup @datapythonista's points?

@jorisvandenbossche
Copy link
Member

OK, editing via github at the same time is not a good idea :-)
@TomAugspurger I will add your changes back, and do the ones of @datapythonista

@jorisvandenbossche jorisvandenbossche merged commit e5975fc into pandas-dev:master Mar 14, 2018
@jorisvandenbossche
Copy link
Member

@BielStela Thanks a lot for the PR!

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 14, 2018 via email

@jorisvandenbossche
Copy link
Member

Also, welcome to take a look in a few hours how the pages is looking to check everything is OK: http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DataFrame.plot.hexbin.html#pandas.DataFrame.plot.hexbin

@jorisvandenbossche
Copy link
Member

@TomAugspurger I have my alias git pr <number> which basically does the same I suppose (and normally I do that to test PRs, but for the doc prs now I was mainly using the github edit interface for really minor things)

@BielStela
Copy link
Contributor Author

Thanks a lot for the help and for the hard work you do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants