DOC: updated the pandas.DataFrame.plot.hexbin docstring #20121

BielStela · 2018-03-10T12:50:44Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:


################################################################################
################### Docstring (pandas.DataFrame.plot.hexbin) ###################
################################################################################

Make hexagonal binning plots.

Make an hexagonal binning plot of *x* versus *y*, where *x*,
*y* are 1-D sequences of the same length, *N*. If *C* is *None*
(the default), this is an histogram of the number of occurrences
of the observations at (x[i],y[i]).

If *C* is specified, specifies values at given coordinates
(x[i],y[i]). These values are accumulated for each hexagonal
bin and then reduced according to *reduce_C_function*, having as default
the numpy's mean function (np.mean). (If *C* is
specified, it must also be a 1-D sequence of the same length
as *x* and *y*.)

Parameters
----------
x : label or position, optional
    Coordinates for x point.
y : label or position, optional
    Coordinates for y point.
C : label or position, optional
    The value at each `(x, y)` point.
reduce_C_function : callable, optional, default `mean`
    Function of one argument that reduces all the values in a bin to
    a single number (e.g. `mean`, `max`, `sum`, `std`).
gridsize : int, optional, default 100
    The number of hexagons in the x-direction.
    The corresponding number of hexagons in the y-direction is chosen
    in a way that the hexagons are approximately regular. Alternatively,
    gridsize can be a tuple with two elements specifying the number of
    hexagons in the x-direction and the y-direction.
kwds : optional
    Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.

Returns
-------
axes : matplotlib.AxesSubplot or np.array of them.

See Also
--------
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib.

Examples
--------

.. plot::
    :context: close-figs

    >>> from  sklearn.datasets import load_iris
    >>> iris = load_iris()
    >>> df = pd.DataFrame(iris.data, columns=iris.feature_names)
    >>> hexbin = df.plot.hexbin(x='sepal length (cm)', y='sepal width (cm)',
    ...                         gridsize=10, cmap='viridis')

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.plot.hexbin" correct. :)

thanks to @Renton2017

pep8speaks · 2018-03-10T12:50:46Z

Hello @BielStela! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 14, 2018 at 12:39 Hours UTC

TomAugspurger · 2018-03-10T12:51:38Z

pandas/plotting/_core.py

-        Hexbin plot
+        Make hexagonal binning plots.
+
+        Make an hexagonal binning plot of *x* versus *y*, where *x*,


an -> a.

Parameter names should be in backticks, so

`x` versus `y`.

Well the topic related to h depends. You say An herb garden

It is actually depends whether the h is silent or not. In case of hexagon I ask if it sound like herbal or like history. In case of history you say always "a history"

Check this link http://editingandwritingservices.com/a-or-an-before-words-beginning-with-h/

Looking for your reply.

Looking forward for your reply Tom, since I might be wrong.

jreback · 2018-03-10T15:08:33Z

pandas/plotting/_core.py

+
+        Make an hexagonal binning plot of `x` versus `y`, where `x`,
+        `y` are 1-D sequences of the same length, `N`. If `C` is `None`
+        (the default), this is an histogram of the number of occurrences


lgtm. if you can find a referene on wikipedia might be nice to link.

I'm afraid there is no reference in wikipedia to hexagonal binning. The closed topic I found in wiki is the data binning article. Maybe it's a little bit to generic to include it in the hexbin docstring since it is also suitable for histogram and histogram2d

jorisvandenbossche

Thanks for the PR! Added some comments

jorisvandenbossche · 2018-03-10T15:59:24Z

pandas/plotting/_core.py

+        Make hexagonal binning plots.
+
+        Make an hexagonal binning plot of `x` versus `y`, where `x`,
+        `y` are 1-D sequences of the same length, `N`. If `C` is `None`


I would remove " where x, y are 1-D sequences of the same length" since they are references to columns

jorisvandenbossche · 2018-03-10T16:00:01Z

pandas/plotting/_core.py

+        having as default
+        the numpy's mean function (np.mean). (If *C* is
+        specified, it must also be a 1-D sequence of the same length
+        as `x` and `y`.)


Can you reflow the text a bit so that each line is just less than 79 chars ?

jorisvandenbossche · 2018-03-10T16:01:08Z

pandas/plotting/_core.py

            Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.

        Returns
        -------
-        axes : matplotlib.AxesSubplot or np.array of them
+        axes : matplotlib.AxesSubplot or np.array of them.


Can you indicate here when an array is returned?

I'm not sure if it ever is an ndarray.

It looks like the pandas wrapper can only get matplotlib.AxesSubplot ( I'm struggling to find a way to get np.array as return) , so if it's ok I'll omit this since it looks like it is a behavior of the original matplotlib function.

jorisvandenbossche · 2018-03-10T16:01:44Z

pandas/plotting/_core.py

+
+        See Also
+        --------
+        matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib.


I would say something like "the matplotlib function that is used under the hood" ?

jorisvandenbossche · 2018-03-10T16:18:27Z

pandas/plotting/_core.py

+            :context: close-figs
+
+            >>> from  sklearn.datasets import load_iris
+            >>> iris = load_iris()


I am not fully sure if we want to rely on sklearn for the example data. In this case, I think it is actually fine to generate some random data with np.random.randn(..), as that will change the generated plot

yes pls don't use sklearn imports

Ok, if there is no problem with using np.random.randn() I'll change the example to one similar to matplotlib's documentation.

yes, you can use random data in this case

I will update the docstrings guide to explain when random data might be OK in examples, and also explain how to set a random seed to avoid changing examples. :)

codecov · 2018-03-11T13:42:04Z

Codecov Report

Merging #20121 into master will decrease coverage by 0.07%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #20121      +/-   ##
==========================================
- Coverage   91.77%    91.7%   -0.08%     
==========================================
  Files         152      150       -2     
  Lines       49185    49152      -33     
==========================================
- Hits        45140    45074      -66     
- Misses       4045     4078      +33

Flag	Coverage Δ
#multiple	`90.08% <ø> (-0.08%)`	⬇️
#single	`41.84% <ø> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/plotting/_core.py	`82.23% <ø> (-0.04%)`	⬇️
pandas/plotting/_compat.py	`62% <0%> (-28.91%)`	⬇️
pandas/core/arrays/base.py	`74.35% <0%> (-2.39%)`	⬇️
pandas/io/html.py	`86.6% <0%> (-2.19%)`	⬇️
pandas/io/formats/format.py	`96.26% <0%> (-1.99%)`	⬇️
pandas/plotting/_converter.py	`65.07% <0%> (-1.74%)`	⬇️
pandas/core/reshape/melt.py	`97.19% <0%> (-0.15%)`	⬇️
pandas/compat/__init__.py	`57.62% <0%> (-0.12%)`	⬇️
pandas/core/indexes/datetimelike.py	`96.7% <0%> (-0.02%)`	⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cad6dc7...347a012. Read the comment docs.

dukebody · 2018-03-11T20:58:59Z

pandas/plotting/_core.py

@@ -2938,26 +2938,59 @@ def scatter(self, x, y, s=None, c=None, **kwds):
    def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,
               **kwds):
        """
-        Hexbin plot
+        Make hexagonal binning plot.


I'd write here also Make an hexagonal binning plot. so it's correct English, but a bit nit-pick.

And, something else, in the other PRs I think they used "Generate" instead of "Make" ? I don't really care which ones, but maybe best to be consistent

dukebody · 2018-03-11T21:03:05Z

pandas/plotting/_core.py


        Parameters
        ----------
-        x, y : label or position, optional
-            Coordinates for each point.
+        x : label or position


In the guide we ask people to use types like, int, str here for the parameter types. Should we accept "label or position" as well for types for column selection? This is something very common in many docstrings and we should to an agreement.

cc/ @datapythonista , @jorisvandenbossche

In my opinion I'd use int or str, and use the description to say it's the label or the position. I don't see a reason to make an exception here.I think it's useful for the user to know that it's int or str. And who knows if in the future having the types can be useful in a way similar to mypy. :)

The reason we might make an exception is because 'str' is too restrictive for labels, as a column labels can in principle be any hashable object ... (but ok, in most cases str will be what people use)

But I am fine here with using int or str

Yes, I forgot about the type. I have a doubt with reduce_C_function parameter: it only accepts numpy functions in the form np.mean or np.std , not strings like 'mean'. Is function (or func) the correct type reference?

dukebody · 2018-03-11T21:04:08Z

pandas/plotting/_core.py

+            Alternatively,
+            gridsize can be a tuple with two elements specifying the number
+            of hexagons in the x-direction and the y-direction.
+        kwds : optional


Should be **kwds instead of kwds according to our agreement, even if the validation script reports this as a mistake.

yes please, we'll fix the script one of these days. :)

dukebody · 2018-03-11T21:06:52Z

pandas/plotting/_core.py

+        Examples
+        --------
+
+        .. plot::


Can we add some plain English text before the example to explain what the example is about?

dukebody · 2018-03-11T21:13:00Z

pandas/plotting/_core.py

+
+        Make an hexagonal binning plot of `x` versus `y`. If `C` is `None`
+        (the default), this is an histogram of the number of occurrences
+        of the observations at (x[i],y[i]).


I believe we should quote this using double backticks since it's code and not just a variable name:

of the observations at ``(x[i],y[i])``.

But I cannot find this in the guide. @datapythonista?

We discussed the backticks really last minute, and the guide wasn't very clear, but I think that's the standard in general, yes.

As it's code, we can also respect PEP-8 and have the space after the comma. :)

dukebody · 2018-03-11T21:14:26Z

pandas/plotting/_core.py

+        If `C` is specified, specifies values at given coordinates
+        (x[i],y[i]). These values are accumulated for each hexagonal
+        bin and then reduced according to `reduce_C_function`,
+        having as default the numpy's mean function (np.mean).


Can we link to the numpy function? I believe:

:meth:`numpy.mean`

should do the job.

dukebody · 2018-03-11T21:16:44Z

pandas/plotting/_core.py

+            The corresponding number of hexagons in the y-direction is
+            chosen in a way that the hexagons are approximately regular.
+            Alternatively,
+            gridsize can be a tuple with two elements specifying the number


Then we should write: gridsize : int or tuple of (int, int), optional ...

dukebody · 2018-03-11T21:17:57Z

pandas/plotting/_core.py

+        See Also
+        --------
+        matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib,
+                    the matplotlib function that is used under the hood.


Is the indentation level of this line OK? I thought it should be at the level of the previous one +4.

I think sphinx doesn't care, but let's indeed use '+4' to be consistent

dukebody · 2018-03-11T21:19:04Z

pandas/plotting/_core.py

+            :context: close-figs
+
+            >>> n = 100000
+            >>> df = pd.DataFrame({'x':np.random.randn(n),


For examples using random data I think we should add a seed. Otherwise the example will change everytime we build the documentation, which is a bit weird IMO.

I've been told that it's OK-ish to use random data in plots without seed if the exact plot result is not important, so nevermind.

dukebody · 2018-03-11T21:19:32Z

pandas/plotting/_core.py

+            >>> n = 100000
+            >>> df = pd.DataFrame({'x':np.random.randn(n),
+            ...                    'y':np.random.randn(n)})
+            >>> hexbin = df.plot.hexbin(x='x', y='y', cmap='viridis')


Can we add an example using C?

Yes, I'm working on one that is self explanatory

jorisvandenbossche · 2018-03-12T08:19:01Z

pandas/plotting/_core.py

@@ -2938,26 +2938,59 @@ def scatter(self, x, y, s=None, c=None, **kwds):
    def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None,
               **kwds):
        """
-        Hexbin plot
+        Make hexagonal binning plot.


And, something else, in the other PRs I think they used "Generate" instead of "Make" ? I don't really care which ones, but maybe best to be consistent

jorisvandenbossche · 2018-03-12T08:21:34Z

pandas/plotting/_core.py


        Parameters
        ----------
-        x, y : label or position, optional
-            Coordinates for each point.
+        x : label or position


The reason we might make an exception is because 'str' is too restrictive for labels, as a column labels can in principle be any hashable object ... (but ok, in most cases str will be what people use)

But I am fine here with using int or str

jorisvandenbossche · 2018-03-12T08:22:22Z

pandas/plotting/_core.py

        C : label or position, optional
            The value at each `(x, y)` point.
-        reduce_C_function : callable, optional
+        reduce_C_function : callable, optional, default `mean`


you can remove the 'optional' part here

(and the same below)

jorisvandenbossche · 2018-03-12T08:24:08Z

pandas/plotting/_core.py

+            gridsize can be a tuple with two elements specifying the number
+            of hexagons in the x-direction and the y-direction.
+        kwds : optional
+            Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`.


Can you make this explanation into "Additional keyword arguments are documented in DataFrame.plot" ?

also, the :py part is not needed

jorisvandenbossche · 2018-03-12T08:24:46Z

pandas/plotting/_core.py

+        See Also
+        --------
+        matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib,
+                    the matplotlib function that is used under the hood.


I think sphinx doesn't care, but let's indeed use '+4' to be consistent

TomAugspurger · 2018-03-12T23:52:22Z

Probably callable

________________________________ From: BielStela <[email protected]> Sent: Monday, March 12, 2018 5:24:47 PM To: pandas-dev/pandas Cc: Tom Augspurger; Comment Subject: Re: [pandas-dev/pandas] DOC: updated the pandas.DataFrame.plot.hexbin docstring (#20121) @BielStela commented on this pull request.

________________________________ In pandas/plotting/_core.py<#20121 (comment)>:

Parameters ---------- - x, y : label or position, optional - Coordinates for each point. + x : label or position Yes, I forgot about the type. I have a doubt with reduce_C_function parameter: it only accepts numpy functions in the form np.mean or np.std not strings like 'mean'. Is function (or func) the correct type reference? — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#20121 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIlrmDmK0dzTrM7Rx4i92_bWqx6GOks5tdvWvgaJpZM4SlQ_G>.

…into docstring_hexbin

jorisvandenbossche · 2018-03-14T12:27:07Z

Fixed the PEP8 issue and conflict with master

TomAugspurger · 2018-03-14T12:27:43Z

1 sec. Fixing some grammar.

It's a hexagonal :)

datapythonista

Looks good, just minor comments

datapythonista · 2018-03-14T12:27:19Z

pandas/plotting/_core.py

+            :context: close-figs
+
+            >>> n = 100000
+            >>> # Make a dataframe with normal distributed data


minor thing, but I haven't seen a comment in the interpreter in the rest of the documentation. I'd simply add to the previous explanation that the data is normal, and this would be a bit more compact and standard

datapythonista · 2018-03-14T12:28:06Z

pandas/plotting/_core.py

            Additional keyword arguments are documented in
            :meth:`pandas.DataFrame.plot`.

        Returns
        -------
-        axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them
+        axes : matplotlib.AxesSubplot


We're using in most places just the type, and in the next line a short summary of what is returned.

datapythonista · 2018-03-14T12:28:57Z

pandas/plotting/_core.py

+        See Also
+        --------
+        matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib,
+            the matplotlib function that is used under the hood.


may be we could add DataFrame.plot here, as it's discussed in the parameters? Not sure what the other methods had in See Also.

jorisvandenbossche · 2018-03-14T12:31:47Z

@BielStela I made a small edit in the actual plot. The default gridsize seems rather useless, so I added one to the first plot to make it look a bit better. I also removed the cmap there (because the "inferno" has dark low values, it is not really fitting for a hexbin IMO, but added a custom cmap to the second example)

TomAugspurger · 2018-03-14T12:32:24Z

@jorisvandenbossche you want to fixup @datapythonista's points?

jorisvandenbossche · 2018-03-14T12:32:48Z

OK, editing via github at the same time is not a good idea :-)
@TomAugspurger I will add your changes back, and do the ones of @datapythonista

jorisvandenbossche · 2018-03-14T12:41:02Z

@BielStela Thanks a lot for the PR!

TomAugspurger · 2018-03-14T12:42:25Z

:) FYI Joris, https://github.com/github/hub is nice for checking out PRs. Just `hub checkout <url>` and all the remotes are setup locally.

…

On Wed, Mar 14, 2018 at 7:41 AM, Joris Van den Bossche < ***@***.***> wrote: @BielStela <https://github.com/bielstela> Thanks a lot for the PR! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20121 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHImg9KQ2qzkpCkVzHn4Rpgm8AdP2Iks5teQ_ngaJpZM4SlQ_G> .

jorisvandenbossche · 2018-03-14T12:43:02Z

Also, welcome to take a look in a few hours how the pages is looking to check everything is OK: http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DataFrame.plot.hexbin.html#pandas.DataFrame.plot.hexbin

jorisvandenbossche · 2018-03-14T12:45:36Z

@TomAugspurger I have my alias git pr <number> which basically does the same I suppose (and normally I do that to test PRs, but for the doc prs now I was mainly using the github edit interface for really minor things)

BielStela · 2018-03-14T14:17:01Z

Thanks a lot for the help and for the hard work you do!

TomAugspurger reviewed Mar 10, 2018

View reviewed changes

BielStela force-pushed the docstring_hexbin branch 3 times, most recently from 979b16d to db52245 Compare March 10, 2018 13:15

DOC: improved hexbin plot docstring

dc9aabf

BielStela force-pushed the docstring_hexbin branch from db52245 to dc9aabf Compare March 10, 2018 13:15

jreback added Docs Visualization plotting labels Mar 10, 2018

jreback approved these changes Mar 10, 2018

View reviewed changes

jreback added this to the 0.23.0 milestone Mar 10, 2018

jorisvandenbossche reviewed Mar 10, 2018

View reviewed changes

BielStela force-pushed the docstring_hexbin branch from 9c4dbe8 to 53bf39d Compare March 11, 2018 13:52

minor fixes from comments and new example using np.random.randn()

17d653b

BielStela force-pushed the docstring_hexbin branch from 8dca0eb to 17d653b Compare March 11, 2018 14:04

Merge branch 'master' into docstring_hexbin

9f94841

dukebody reviewed Mar 11, 2018

View reviewed changes

jorisvandenbossche reviewed Mar 12, 2018

View reviewed changes

BielStela and others added 3 commits March 14, 2018 12:56

Minor fixes to hexbin docstring and new example using C parameter

c686467

Merge branch 'docstring_hexbin' of https://github.com/BielStela/pandas …

4977860

…into docstring_hexbin

Update _core.py

697e9fa

jorisvandenbossche approved these changes Mar 14, 2018

View reviewed changes

Merge branch 'master' into docstring_hexbin

439721e

Grammar

b31b687

datapythonista reviewed Mar 14, 2018

View reviewed changes

TomAugspurger approved these changes Mar 14, 2018

View reviewed changes

edit in plot (lower gridsize)

4d7b73c

add back changes + changes for feedback

347a012

jorisvandenbossche merged commit e5975fc into pandas-dev:master Mar 14, 2018

DOC: updated the pandas.DataFrame.plot.hexbin docstring #20121

DOC: updated the pandas.DataFrame.plot.hexbin docstring #20121

Conversation

BielStela commented Mar 10, 2018

pep8speaks commented Mar 10, 2018 • edited Loading

Comment last updated on March 14, 2018 at 12:39 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Mar 11, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BielStela Mar 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Mar 12, 2018 via email

jorisvandenbossche commented Mar 14, 2018

TomAugspurger commented Mar 14, 2018 • edited Loading

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Mar 14, 2018

TomAugspurger commented Mar 14, 2018

jorisvandenbossche commented Mar 14, 2018

jorisvandenbossche commented Mar 14, 2018

TomAugspurger commented Mar 14, 2018 via email

jorisvandenbossche commented Mar 14, 2018

jorisvandenbossche commented Mar 14, 2018

BielStela commented Mar 14, 2018

pep8speaks commented Mar 10, 2018 •

edited

Loading

codecov bot commented Mar 11, 2018 •

edited

Loading

BielStela Mar 12, 2018 •

edited

Loading

TomAugspurger commented Mar 14, 2018 •

edited

Loading