Skip to content

DOC: updated the pandas.DataFrame.plot.hexbin docstring #20121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 14, 2018
53 changes: 43 additions & 10 deletions pandas/plotting/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2938,26 +2938,59 @@ def scatter(self, x, y, s=None, c=None, **kwds):
def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,
**kwds):
"""
Hexbin plot
Make hexagonal binning plot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd write here also Make an hexagonal binning plot. so it's correct English, but a bit nit-pick.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, something else, in the other PRs I think they used "Generate" instead of "Make" ? I don't really care which ones, but maybe best to be consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, Done


Make an hexagonal binning plot of `x` versus `y`. If `C` is `None`
(the default), this is an histogram of the number of occurrences
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. if you can find a referene on wikipedia might be nice to link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid there is no reference in wikipedia to hexagonal binning. The closed topic I found in wiki is the data binning article. Maybe it's a little bit to generic to include it in the hexbin docstring since it is also suitable for histogram and histogram2d

of the observations at (x[i],y[i]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should quote this using double backticks since it's code and not just a variable name:

of the observations at ``(x[i],y[i])``.

But I cannot find this in the guide. @datapythonista?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the backticks really last minute, and the guide wasn't very clear, but I think that's the standard in general, yes.

As it's code, we can also respect PEP-8 and have the space after the comma. :)


If `C` is specified, specifies values at given coordinates
(x[i],y[i]). These values are accumulated for each hexagonal
bin and then reduced according to `reduce_C_function`,
having as default the numpy's mean function (np.mean).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link to the numpy function? I believe:

:meth:`numpy.mean`

should do the job.

(If `C` is specified, it must also be a 1-D sequence
of the same length as `x` and `y`.)

Parameters
----------
x, y : label or position, optional
Coordinates for each point.
x : label or position
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the guide we ask people to use types like, int, str here for the parameter types. Should we accept "label or position" as well for types for column selection? This is something very common in many docstrings and we should to an agreement.

cc/ @datapythonista , @jorisvandenbossche

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion I'd use int or str, and use the description to say it's the label or the position. I don't see a reason to make an exception here.I think it's useful for the user to know that it's int or str. And who knows if in the future having the types can be useful in a way similar to mypy. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we might make an exception is because 'str' is too restrictive for labels, as a column labels can in principle be any hashable object ... (but ok, in most cases str will be what people use)

But I am fine here with using int or str

Copy link
Contributor Author

@BielStela BielStela Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot about the type. I have a doubt with reduce_C_function parameter: it only accepts numpy functions in the form np.mean or np.std , not strings like 'mean'. Is function (or func) the correct type reference?

Coordinates for x points.
y : label or position
Coordinates for y points.
C : label or position, optional
The value at each `(x, y)` point.
reduce_C_function : callable, optional
reduce_C_function : callable, optional, default `mean`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the 'optional' part here

(and the same below)

Function of one argument that reduces all the values in a bin to
a single number (e.g. `mean`, `max`, `sum`, `std`).
gridsize : int, optional
Number of bins.
`**kwds` : optional
Additional keyword arguments are documented in
:meth:`pandas.DataFrame.plot`.
gridsize : int, optional, default 100
The number of hexagons in the x-direction.
The corresponding number of hexagons in the y-direction is
chosen in a way that the hexagons are approximately regular.
Alternatively,
gridsize can be a tuple with two elements specifying the number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should write: gridsize : int or tuple of (int, int), optional ...

of hexagons in the x-direction and the y-direction.
kwds : optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be **kwds instead of kwds according to our agreement, even if the validation script reports this as a mistake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please, we'll fix the script one of these days. :)

Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this explanation into "Additional keyword arguments are documented in DataFrame.plot" ?

also, the :py part is not needed


Returns
-------
axes : matplotlib.AxesSubplot or np.array of them
axes : matplotlib.AxesSubplot.

See Also
--------
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib,
the matplotlib function that is used under the hood.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the indentation level of this line OK? I thought it should be at the level of the previous one +4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sphinx doesn't care, but let's indeed use '+4' to be consistent


Examples
--------

.. plot::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some plain English text before the example to explain what the example is about?

:context: close-figs

>>> n = 100000
>>> df = pd.DataFrame({'x':np.random.randn(n),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For examples using random data I think we should add a seed. Otherwise the example will change everytime we build the documentation, which is a bit weird IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been told that it's OK-ish to use random data in plots without seed if the exact plot result is not important, so nevermind.

... 'y':np.random.randn(n)})
>>> hexbin = df.plot.hexbin(x='x', y='y', cmap='viridis')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an example using C?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm working on one that is self explanatory

"""
if reduce_C_function is not None:
kwds['reduce_C_function'] = reduce_C_function
Expand Down