Skip to content

DOC: updated the pandas.DataFrame.plot.hexbin docstring #20121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 14, 2018
78 changes: 67 additions & 11 deletions pandas/plotting/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -3174,26 +3174,82 @@ def scatter(self, x, y, s=None, c=None, **kwds):
def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,
**kwds):
"""
Hexbin plot
Generate an hexagonal binning plot.

Generate an hexagonal binning plot of `x` versus `y`. If `C` is `None`
(the default), this is an histogram of the number of occurrences
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. if you can find a referene on wikipedia might be nice to link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid there is no reference in wikipedia to hexagonal binning. The closed topic I found in wiki is the data binning article. Maybe it's a little bit to generic to include it in the hexbin docstring since it is also suitable for histogram and histogram2d

of the observations at ``(x[i], y[i])``.

If `C` is specified, specifies values at given coordinates
``(x[i], y[i])``. These values are accumulated for each hexagonal
bin and then reduced according to `reduce_C_function`,
having as default the numpy's mean function (:meth:`numpy.mean`).
(If `C` is specified, it must also be a 1-D sequence
of the same length as `x` and `y`.)

Parameters
----------
x, y : label or position, optional
Coordinates for each point.
C : label or position, optional
The value at each `(x, y)` point.
reduce_C_function : callable, optional
x : int or str
The column label or position for x points.
y : int or str
The column label or position for y points.
C : int or str, optional
The column label or position for the value of `(x, y)` point.
reduce_C_function : callable, default `np.mean`
Function of one argument that reduces all the values in a bin to
a single number (e.g. `mean`, `max`, `sum`, `std`).
gridsize : int, optional
Number of bins.
`**kwds` : optional
a single number (e.g. `np.mean`, `np.max`, `np.sum`, `np.std`).
gridsize : int or tuple of (int, int), default 100
The number of hexagons in the x-direction.
The corresponding number of hexagons in the y-direction is
chosen in a way that the hexagons are approximately regular.
Alternatively, gridsize can be a tuple with two elements
specifying the number of hexagons in the x-direction and the
y-direction.
**kwds
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them
axes : matplotlib.AxesSubplot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using in most places just the type, and in the next line a short summary of what is returned.


See Also
--------
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib,
the matplotlib function that is used under the hood.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be we could add DataFrame.plot here, as it's discussed in the parameters? Not sure what the other methods had in See Also.


Examples
--------
The following examples are generated with random data.

.. plot::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some plain English text before the example to explain what the example is about?

:context: close-figs

>>> n = 100000
>>> # Make a dataframe with normal distributed data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor thing, but I haven't seen a comment in the interpreter in the rest of the documentation. I'd simply add to the previous explanation that the data is normal, and this would be a bit more compact and standard

>>> df = pd.DataFrame({'x': np.random.randn(n),
... 'y': np.random.randn(n)})
>>> ax = df.plot.hexbin(x='x', y='y', cmap='inferno')

The next example uses `C` and `np.sum` as `reduce_C_function`.
Note that `'observations'` values ranges from 1 to 5 but the result
plot shows values up to more than 25. This is because of the
`reduce_C_function`.

.. plot::
:context: close-figs

>>> n = 500
>>> df = pd.DataFrame({
... 'coord_x': np.random.uniform(-3, 3, size=n),
... 'coord_y': np.random.uniform(30, 50, size=n),
... 'observations': np.random.randint(1,5, size=n)
... })
>>> ax = df.plot.hexbin(x='coord_x',
... y='coord_y',
... C='observations',
... reduce_C_function=np.sum,
... gridsize=10)
"""
if reduce_C_function is not None:
kwds['reduce_C_function'] = reduce_C_function
Expand Down