Skip to content

DOC: updated the pandas.DataFrame.plot.hexbin docstring #20121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 14, 2018
53 changes: 45 additions & 8 deletions pandas/plotting/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2874,25 +2874,62 @@ def scatter(self, x, y, s=None, c=None, **kwds):
def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,
**kwds):
"""
Hexbin plot
Make hexagonal binning plots.

Make an hexagonal binning plot of `x` versus `y`, where `x`,
`y` are 1-D sequences of the same length, `N`. If `C` is `None`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove " where x, y are 1-D sequences of the same length" since they are references to columns

(the default), this is an histogram of the number of occurrences
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. if you can find a referene on wikipedia might be nice to link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid there is no reference in wikipedia to hexagonal binning. The closed topic I found in wiki is the data binning article. Maybe it's a little bit to generic to include it in the hexbin docstring since it is also suitable for histogram and histogram2d

of the observations at (x[i],y[i]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should quote this using double backticks since it's code and not just a variable name:

of the observations at ``(x[i],y[i])``.

But I cannot find this in the guide. @datapythonista?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the backticks really last minute, and the guide wasn't very clear, but I think that's the standard in general, yes.

As it's code, we can also respect PEP-8 and have the space after the comma. :)


If `C` is specified, specifies values at given coordinates
(x[i],y[i]). These values are accumulated for each hexagonal
bin and then reduced according to `reduce_C_function`,
having as default
the numpy's mean function (np.mean). (If *C* is
specified, it must also be a 1-D sequence of the same length
as `x` and `y`.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reflow the text a bit so that each line is just less than 79 chars ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


Parameters
----------
x, y : label or position, optional
Coordinates for each point.
x : label or position, optional
Coordinates for x point.
y : label or position, optional
Coordinates for y point.
C : label or position, optional
The value at each `(x, y)` point.
reduce_C_function : callable, optional
reduce_C_function : callable, optional, default `mean`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the 'optional' part here

(and the same below)

Function of one argument that reduces all the values in a bin to
a single number (e.g. `mean`, `max`, `sum`, `std`).
gridsize : int, optional
Number of bins.
`**kwds` : optional
gridsize : int, optional, default 100
The number of hexagons in the x-direction.
The corresponding number of hexagons in the y-direction is
chosen in a way that the hexagons are approximately regular.
Alternatively,
gridsize can be a tuple with two elements specifying the number of
hexagons in the x-direction and the y-direction.
kwds : optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be **kwds instead of kwds according to our agreement, even if the validation script reports this as a mistake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please, we'll fix the script one of these days. :)

Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this explanation into "Additional keyword arguments are documented in DataFrame.plot" ?

also, the :py part is not needed


Returns
-------
axes : matplotlib.AxesSubplot or np.array of them
axes : matplotlib.AxesSubplot or np.array of them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indicate here when an array is returned?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it ever is an ndarray.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the pandas wrapper can only get matplotlib.AxesSubplot ( I'm struggling to find a way to get np.array as return) , so if it's ok I'll omit this since it looks like it is a behavior of the original matplotlib function.


See Also
--------
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say something like "the matplotlib function that is used under the hood" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Examples
--------

.. plot::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some plain English text before the example to explain what the example is about?

:context: close-figs

>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not fully sure if we want to rely on sklearn for the example data. In this case, I think it is actually fine to generate some random data with np.random.randn(..), as that will change the generated plot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pls don't use sklearn imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if there is no problem with using np.random.randn() I'll change the example to one similar to matplotlib's documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you can use random data in this case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the docstrings guide to explain when random data might be OK in examples, and also explain how to set a random seed to avoid changing examples. :)

>>> df = pd.DataFrame(iris.data, columns=iris.feature_names)
>>> hexbin = df.plot.hexbin(x='sepal length (cm)',
... y='sepal width (cm)',
... gridsize=10, cmap='viridis')
"""
if reduce_C_function is not None:
kwds['reduce_C_function'] = reduce_C_function
Expand Down