-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Set value for undefined variables in examples #51389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Set value for undefined variables in examples #51389
Conversation
There is 1 undefined error I have yet to fix, which is in line 262 of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kkangs0226 for working on this!
For unbox_scalar
, I think instantiating some example of a class on which it works would be fine - you could add a note saying that it's just an example of an object it can work on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, getting there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me pending comments from @datapythonista , thanks @kkangs0226 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice. I added couple of last suggestions, but great changes!
Fantastic work, merging on green. |
@kkangs0226 thanks a lot for your work. I wonder if you could please make the resulting html file of the docstring, and share it here as pdf? I think that we have to follow the documenation conventions (please see the topics regarding short names and numbers): https://pandas.pydata.org/docs/development/contributing_docstring.html#conventions-for-the-examples We would also like to avoid extra parentheses. cc: @MarcoGorelli |
@DeaMariaLeon I have also attached the resulting html file for your viewing. |
@kkangs0226 Thank you very much. Yes, the parentheses are to avoid a flake8 error. But the examples should look as if you were using the interpreter. You wouldn’t be using parentheses in that case. I opened a PR #51634 after a PR you did (it hasn't finished the ci tests). It passes flake8 locally and it doesn't have the extra parentheses. As for the names of the df, that's good. What I meant was also that the name of the functions are very long IMO. But if everybody is happy, that's good. Again, thank you for all this. |
@DeaMariaLeon Thanks for the clarifications, I have removed the extra parentheses! As for the long function names, since they were suggested by @datapythonista , perhaps we can ask for his thoughts on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is close, thanks for sticking with it 🙌
There's just a CI error in the docstrings job (the other one is unrelated, don't worry about it)
pandas/core/apply.py
Outdated
>>> _relabel_result(result, func, columns, order) # doctest: +SKIP | ||
>>> relabel_result(result, funcs, columns, order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is throwing an error in CI:
=================================== FAILURES ===================================
__________________ [doctest] pandas.core.apply.relabel_result __________________
1324 order: New order for relabelling
1325
1326 Examples:
1327 ---------
1328 >>> result = DataFrame({"A": [np.nan, 2, np.nan],
1329 ... "C": [6, np.nan, np.nan], "B": [np.nan, 4, 2.5]}) # doctest: +SKIP
1330 >>> funcs = {"A": ["max"], "C": ["max"], "B": ["mean", "min"]}
1331 >>> columns = ("foo", "aab", "bar", "dat")
1332 >>> order = [0, 1, 2, 3]
1333 >>> relabel_result(result, funcs, columns, order)
UNEXPECTED EXCEPTION: NameError("name 'result' is not defined")
Traceback (most recent call last):
File "/home/runner/micromamba/envs/test/lib/python3.8/doctest.py", line 1336, in __run
exec(compile(example.source, filename, "single",
File "<doctest pandas.core.apply.relabel_result[4]>", line 1, in <module>
NameError: name 'result' is not defined
I think you need to remove # doctest: +SKIP
from a few lines above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the error!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kkangs0226! ... Is there any chance that you could run this command? :
./scripts/validate_docstrings.py pandas.core.apply.relabel_result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the output, it still flags as failed example but I think @MarcoGorelli has mentioned that this error is unrelated. It seems to be one of F821 undefined errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I meant that the Ubuntu / Numpy Dev (pull_request)
CI job's failure is unrelated
this one looks related, you may need to put something like from pandas.core.apply import relabel_result
at the top of the doctest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for trying - sure, I'll take a look tomorrow, hopefully there's a simple fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring of relabel_result
has many errors. I see 16:
The summary should be in one line, the parameters and even the subtitles are incorrect. (They shouldn't have ":", it should be Examples
, not Examples:
) etc.
But at the beginning of the function it's written "Internal function...", and this is not on the documentation website.
Should we test internal functions docstrings?! @MarcoGorelli
Thank you @kkangs0226... We'll get there. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should fix the whole docstring here. Let's merge this when all the problems to the original PR scope are fixed, and open follow up PRs for anything else we want to fix.
@kkangs0226 can you provide the whole traceback for the error please? With only the error and the line you shared I don't know what the problem can be, and more context would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the entire traceback, hope it is able to provide more context.
Line 21, in pandas.core.apply.relabel_result
Failed example:
relabel_result(result, funcs, columns, order)
Exception raised:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/doctest.py", line 1336, in __run
exec(compile(example.source, filename, "single",
File "<doctest pandas.core.apply.relabel_result[6]>", line 1, in <module>
relabel_result(result, funcs, columns, order)
File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/apply.py", line 1377, in relabel_result
s = s[col_idx_order]
File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/series.py", line 968, in __getitem__
return self._get_with(key)
File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/series.py", line 1003, in _get_with
return self.loc[key]
File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1100, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1329, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1269, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis)
File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1459, in _get_listlike_indexer
keyarr, indexer = ax._get_indexer_strict(key, axis_name)
File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexes/base.py", line 5851, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexes/base.py", line 5910, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index([-1], dtype='int64')] are in the [index]"```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, got it - I'll post a new review comment showing how I'd go about fixing this (as it's a bit long and easy to get lost in a thread)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to merge this as is if the original issue being addressed is complete and the CI is green, and continue with other fixes in follow up PRs.
pandas/core/apply.py
Outdated
>>> _relabel_result(result, func, columns, order) # doctest: +SKIP | ||
>>> relabel_result(result, funcs, columns, order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should fix the whole docstring here. Let's merge this when all the problems to the original PR scope are fixed, and open follow up PRs for anything else we want to fix.
@kkangs0226 can you provide the whole traceback for the error please? With only the error and the line you shared I don't know what the problem can be, and more context would be helpful.
pandas/core/generic.py
Outdated
>>> df.pipe( | ||
... subtract_federal_tax | ||
... ).pipe( | ||
... subtract_state_tax, rate=0.12 | ||
... ).pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the parentheses are to avoid a flake8 error. But the examples should look as if you were using the interpreter. You wouldn’t be using parentheses in that case.
Does this comment refer to the parenthesis wrapping this whole block? If that's the case, I don't think this is true. The original syntax is much more common and readable than this in my opinion. This is what we've been doing in the docs that I know, it's how code was chained in the original post about method chaining, it's also the syntax that Matt Harrison uses in the Effective pandas book, and it's what I would write.
To be clear, I'm referring to using
(df.pipe(foo)
.pipe(bar)
.pipe(foobar)
)
instead of
df.pipe(
foo
).pipe(
bar
).pipe(
foobar
)
I'm fine with whatever (even if I personally found the former much more readable), but I wouldn't ask people to change the the latter because the former is not what someone would write, because I think many people would actually write that. :)
The To get it running, I'd first look for an example of it being used in practice: for that, I went to https://pandas-coverage.herokuapp.com/ , looked for the file
I then ran that test, with a breakpoint on line 1980, and saw that the aggregations need to be part of So, here's one way of writing this example such that it passes: >>> from pandas.core.apply import relabel_result
>>> result = pd.DataFrame(
... {"A": [np.nan, 2, np.nan], "C": [6, np.nan, np.nan], "B": [np.nan, 4, 2.5]},
... index=["max", "mean", "min"]
... )
>>> funcs = {"A": ["max"], "C": ["max"], "B": ["mean", "min"]}
>>> columns = ("foo", "aab", "bar", "dat")
>>> order = [0, 1, 2, 3]
>>> result_in_dict = relabel_result(result, funcs, columns, order)
>>> pd.DataFrame(result_in_dict, index=columns)
A C B
foo 2.0 NaN NaN
aab NaN 6.0 NaN
bar NaN NaN 4.0
dat NaN NaN 2.5 |
Also, for me, the Lines 1326 to 1327 in 8a40960
to
|
I went ahead and fixed up the remaining tidbits, hope that's OK - thanks @kkangs0226 for your PR, and Dea for reviewing! |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.