-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: updated the pandas.DataFrame.plot.hexbin docstring #20121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: updated the pandas.DataFrame.plot.hexbin docstring #20121
Conversation
Hello @BielStela! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on March 14, 2018 at 12:39 Hours UTC |
pandas/plotting/_core.py
Outdated
Hexbin plot | ||
Make hexagonal binning plots. | ||
|
||
Make an hexagonal binning plot of *x* versus *y*, where *x*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an -> a.
Parameter names should be in backticks, so
`x` versus `y`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the topic related to h depends. You say An herb garden
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually depends whether the h is silent or not. In case of hexagon I ask if it sound like herbal or like history. In case of history you say always "a history"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking for your reply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward for your reply Tom, since I might be wrong.
979b16d
to
db52245
Compare
db52245
to
dc9aabf
Compare
pandas/plotting/_core.py
Outdated
|
||
Make an hexagonal binning plot of `x` versus `y`, where `x`, | ||
`y` are 1-D sequences of the same length, `N`. If `C` is `None` | ||
(the default), this is an histogram of the number of occurrences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. if you can find a referene on wikipedia might be nice to link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid there is no reference in wikipedia to hexagonal binning. The closed topic I found in wiki is the data binning article. Maybe it's a little bit to generic to include it in the hexbin docstring since it is also suitable for histogram and histogram2d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Added some comments
pandas/plotting/_core.py
Outdated
Make hexagonal binning plots. | ||
|
||
Make an hexagonal binning plot of `x` versus `y`, where `x`, | ||
`y` are 1-D sequences of the same length, `N`. If `C` is `None` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove " where x
, y
are 1-D sequences of the same length" since they are references to columns
pandas/plotting/_core.py
Outdated
having as default | ||
the numpy's mean function (np.mean). (If *C* is | ||
specified, it must also be a 1-D sequence of the same length | ||
as `x` and `y`.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reflow the text a bit so that each line is just less than 79 chars ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
pandas/plotting/_core.py
Outdated
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
axes : matplotlib.AxesSubplot or np.array of them | ||
axes : matplotlib.AxesSubplot or np.array of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you indicate here when an array is returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it ever is an ndarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the pandas wrapper can only get matplotlib.AxesSubplot
( I'm struggling to find a way to get np.array
as return) , so if it's ok I'll omit this since it looks like it is a behavior of the original matplotlib function.
pandas/plotting/_core.py
Outdated
|
||
See Also | ||
-------- | ||
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say something like "the matplotlib function that is used under the hood" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/plotting/_core.py
Outdated
:context: close-figs | ||
|
||
>>> from sklearn.datasets import load_iris | ||
>>> iris = load_iris() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not fully sure if we want to rely on sklearn for the example data. In this case, I think it is actually fine to generate some random data with np.random.randn(..)
, as that will change the generated plot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes pls don't use sklearn imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if there is no problem with using np.random.randn()
I'll change the example to one similar to matplotlib's documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you can use random data in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the docstrings guide to explain when random data might be OK in examples, and also explain how to set a random seed to avoid changing examples. :)
Codecov Report
@@ Coverage Diff @@
## master #20121 +/- ##
==========================================
- Coverage 91.77% 91.7% -0.08%
==========================================
Files 152 150 -2
Lines 49185 49152 -33
==========================================
- Hits 45140 45074 -66
- Misses 4045 4078 +33
Continue to review full report at Codecov.
|
9c4dbe8
to
53bf39d
Compare
8dca0eb
to
17d653b
Compare
pandas/plotting/_core.py
Outdated
@@ -2938,26 +2938,59 @@ def scatter(self, x, y, s=None, c=None, **kwds): | |||
def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None, | |||
**kwds): | |||
""" | |||
Hexbin plot | |||
Make hexagonal binning plot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd write here also Make an hexagonal binning plot.
so it's correct English, but a bit nit-pick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, something else, in the other PRs I think they used "Generate" instead of "Make" ? I don't really care which ones, but maybe best to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, Done
pandas/plotting/_core.py
Outdated
|
||
Parameters | ||
---------- | ||
x, y : label or position, optional | ||
Coordinates for each point. | ||
x : label or position |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the guide we ask people to use types like, int
, str
here for the parameter types. Should we accept "label or position" as well for types for column selection? This is something very common in many docstrings and we should to an agreement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion I'd use int or str
, and use the description to say it's the label or the position. I don't see a reason to make an exception here.I think it's useful for the user to know that it's int
or str
. And who knows if in the future having the types can be useful in a way similar to mypy. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we might make an exception is because 'str' is too restrictive for labels, as a column labels can in principle be any hashable object ... (but ok, in most cases str will be what people use)
But I am fine here with using int or str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I forgot about the type. I have a doubt with reduce_C_function
parameter: it only accepts numpy functions in the form np.mean
or np.std
, not strings like 'mean'
. Is function
(or func
) the correct type reference?
pandas/plotting/_core.py
Outdated
Alternatively, | ||
gridsize can be a tuple with two elements specifying the number | ||
of hexagons in the x-direction and the y-direction. | ||
kwds : optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be **kwds
instead of kwds
according to our agreement, even if the validation script reports this as a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please, we'll fix the script one of these days. :)
pandas/plotting/_core.py
Outdated
Examples | ||
-------- | ||
|
||
.. plot:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some plain English text before the example to explain what the example is about?
pandas/plotting/_core.py
Outdated
|
||
Make an hexagonal binning plot of `x` versus `y`. If `C` is `None` | ||
(the default), this is an histogram of the number of occurrences | ||
of the observations at (x[i],y[i]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we should quote this using double backticks since it's code and not just a variable name:
of the observations at ``(x[i],y[i])``.
But I cannot find this in the guide. @datapythonista?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed the backticks really last minute, and the guide wasn't very clear, but I think that's the standard in general, yes.
As it's code, we can also respect PEP-8 and have the space after the comma. :)
pandas/plotting/_core.py
Outdated
If `C` is specified, specifies values at given coordinates | ||
(x[i],y[i]). These values are accumulated for each hexagonal | ||
bin and then reduced according to `reduce_C_function`, | ||
having as default the numpy's mean function (np.mean). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we link to the numpy function? I believe:
:meth:`numpy.mean`
should do the job.
pandas/plotting/_core.py
Outdated
The corresponding number of hexagons in the y-direction is | ||
chosen in a way that the hexagons are approximately regular. | ||
Alternatively, | ||
gridsize can be a tuple with two elements specifying the number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we should write: gridsize : int or tuple of (int, int), optional ...
pandas/plotting/_core.py
Outdated
See Also | ||
-------- | ||
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib, | ||
the matplotlib function that is used under the hood. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the indentation level of this line OK? I thought it should be at the level of the previous one +4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sphinx doesn't care, but let's indeed use '+4' to be consistent
pandas/plotting/_core.py
Outdated
:context: close-figs | ||
|
||
>>> n = 100000 | ||
>>> df = pd.DataFrame({'x':np.random.randn(n), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For examples using random data I think we should add a seed. Otherwise the example will change everytime we build the documentation, which is a bit weird IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been told that it's OK-ish to use random data in plots without seed if the exact plot result is not important, so nevermind.
pandas/plotting/_core.py
Outdated
>>> n = 100000 | ||
>>> df = pd.DataFrame({'x':np.random.randn(n), | ||
... 'y':np.random.randn(n)}) | ||
>>> hexbin = df.plot.hexbin(x='x', y='y', cmap='viridis') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an example using C
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm working on one that is self explanatory
pandas/plotting/_core.py
Outdated
@@ -2938,26 +2938,59 @@ def scatter(self, x, y, s=None, c=None, **kwds): | |||
def hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None, | |||
**kwds): | |||
""" | |||
Hexbin plot | |||
Make hexagonal binning plot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And, something else, in the other PRs I think they used "Generate" instead of "Make" ? I don't really care which ones, but maybe best to be consistent
pandas/plotting/_core.py
Outdated
|
||
Parameters | ||
---------- | ||
x, y : label or position, optional | ||
Coordinates for each point. | ||
x : label or position |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we might make an exception is because 'str' is too restrictive for labels, as a column labels can in principle be any hashable object ... (but ok, in most cases str will be what people use)
But I am fine here with using int or str
pandas/plotting/_core.py
Outdated
C : label or position, optional | ||
The value at each `(x, y)` point. | ||
reduce_C_function : callable, optional | ||
reduce_C_function : callable, optional, default `mean` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove the 'optional' part here
(and the same below)
pandas/plotting/_core.py
Outdated
gridsize can be a tuple with two elements specifying the number | ||
of hexagons in the x-direction and the y-direction. | ||
kwds : optional | ||
Keyword arguments to pass on to :py:meth:`pandas.DataFrame.plot`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make this explanation into "Additional keyword arguments are documented in DataFrame.plot" ?
also, the :py
part is not needed
pandas/plotting/_core.py
Outdated
See Also | ||
-------- | ||
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib, | ||
the matplotlib function that is used under the hood. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think sphinx doesn't care, but let's indeed use '+4' to be consistent
Probably callable
________________________________
From: BielStela <[email protected]>
Sent: Monday, March 12, 2018 5:24:47 PM
To: pandas-dev/pandas
Cc: Tom Augspurger; Comment
Subject: Re: [pandas-dev/pandas] DOC: updated the pandas.DataFrame.plot.hexbin docstring (#20121)
@BielStela commented on this pull request.
________________________________
In pandas/plotting/_core.py<#20121 (comment)>:
Parameters
----------
- x, y : label or position, optional
- Coordinates for each point.
+ x : label or position
Yes, I forgot about the type. I have a doubt with reduce_C_function parameter: it only accepts numpy functions in the form np.mean or np.std not strings like 'mean'. Is function (or func) the correct type reference?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#20121 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIlrmDmK0dzTrM7Rx4i92_bWqx6GOks5tdvWvgaJpZM4SlQ_G>.
|
Fixed the PEP8 issue and conflict with master |
1 sec. Fixing some grammar. It's a hexagonal :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just minor comments
pandas/plotting/_core.py
Outdated
:context: close-figs | ||
|
||
>>> n = 100000 | ||
>>> # Make a dataframe with normal distributed data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor thing, but I haven't seen a comment in the interpreter in the rest of the documentation. I'd simply add to the previous explanation that the data is normal, and this would be a bit more compact and standard
pandas/plotting/_core.py
Outdated
Additional keyword arguments are documented in | ||
:meth:`pandas.DataFrame.plot`. | ||
|
||
Returns | ||
------- | ||
axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them | ||
axes : matplotlib.AxesSubplot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using in most places just the type, and in the next line a short summary of what is returned.
pandas/plotting/_core.py
Outdated
See Also | ||
-------- | ||
matplotlib.pyplot.hexbin : hexagonal binning plot using matplotlib, | ||
the matplotlib function that is used under the hood. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be we could add DataFrame.plot
here, as it's discussed in the parameters? Not sure what the other methods had in See Also
.
@BielStela I made a small edit in the actual plot. The default gridsize seems rather useless, so I added one to the first plot to make it look a bit better. I also removed the cmap there (because the "inferno" has dark low values, it is not really fitting for a hexbin IMO, but added a custom cmap to the second example) |
@jorisvandenbossche you want to fixup @datapythonista's points? |
OK, editing via github at the same time is not a good idea :-) |
@BielStela Thanks a lot for the PR! |
:)
FYI Joris, https://github.com/github/hub is nice for checking out PRs. Just
`hub checkout <url>` and all the remotes are setup locally.
…On Wed, Mar 14, 2018 at 7:41 AM, Joris Van den Bossche < ***@***.***> wrote:
@BielStela <https://github.com/bielstela> Thanks a lot for the PR!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20121 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHImg9KQ2qzkpCkVzHn4Rpgm8AdP2Iks5teQ_ngaJpZM4SlQ_G>
.
|
Also, welcome to take a look in a few hours how the pages is looking to check everything is OK: http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.DataFrame.plot.hexbin.html#pandas.DataFrame.plot.hexbin |
@TomAugspurger I have my alias |
Thanks a lot for the help and for the hard work you do! |
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
thanks to @Renton2017