Skip to content

DOC: Set value for undefined variables in examples #51389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 28, 2023

Conversation

kkangs0226
Copy link
Contributor

@kkangs0226
Copy link
Contributor Author

kkangs0226 commented Feb 14, 2023

There is 1 undefined error I have yet to fix, which is in line 262 of pandas/core/arrays/datetimelike.py.
image.
This method is under DatetimeLikeArrayMixin class.

self is undefined. I would like to confirm whether I should instantiate this class and replace self with the object. The class takes in in other objects, which in turn take in other objects as well, so it would involve many classes that might only make the example more confusing.
@datapythonista Please let me know what you think!

@mroeschke mroeschke added the Docs label Feb 14, 2023
@MarcoGorelli MarcoGorelli self-requested a review February 14, 2023 21:23
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kkangs0226 for working on this!

For unbox_scalar, I think instantiating some example of a class on which it works would be fine - you could add a note saying that it's just an example of an object it can work on

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, getting there

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me pending comments from @datapythonista , thanks @kkangs0226 !

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice. I added couple of last suggestions, but great changes!

@datapythonista
Copy link
Member

Fantastic work, merging on green.

@DeaMariaLeon
Copy link
Member

@kkangs0226 thanks a lot for your work. I wonder if you could please make the resulting html file of the docstring, and share it here as pdf?

I think that we have to follow the documenation conventions (please see the topics regarding short names and numbers): https://pandas.pydata.org/docs/development/contributing_docstring.html#conventions-for-the-examples

We would also like to avoid extra parentheses.

cc: @MarcoGorelli

@kkangs0226
Copy link
Contributor Author

@DeaMariaLeon
I have updated the variable name of the dataframe to df as suggested in the documenation conventions.
As for the extra parentheses for multi-line doctests, they are to avoid the F721 syntax error. More details can be found here: #51341

I have also attached the resulting html file for your viewing.
pandas_DataFrame_pipe.pdf

@DeaMariaLeon
Copy link
Member

@kkangs0226 Thank you very much.

Yes, the parentheses are to avoid a flake8 error. But the examples should look as if you were using the interpreter. You wouldn’t be using parentheses in that case. I opened a PR #51634 after a PR you did (it hasn't finished the ci tests). It passes flake8 locally and it doesn't have the extra parentheses.

As for the names of the df, that's good. What I meant was also that the name of the functions are very long IMO. But if everybody is happy, that's good.

Again, thank you for all this.

@kkangs0226
Copy link
Contributor Author

@DeaMariaLeon Thanks for the clarifications, I have removed the extra parentheses!

As for the long function names, since they were suggested by @datapythonista , perhaps we can ask for his thoughts on it.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is close, thanks for sticking with it 🙌

There's just a CI error in the docstrings job (the other one is unrelated, don't worry about it)

>>> _relabel_result(result, func, columns, order) # doctest: +SKIP
>>> relabel_result(result, funcs, columns, order)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is throwing an error in CI:

=================================== FAILURES ===================================
__________________ [doctest] pandas.core.apply.relabel_result __________________
1324     order: New order for relabelling
1325 
1326     Examples:
1327     ---------
1328     >>> result = DataFrame({"A": [np.nan, 2, np.nan],
1329     ...       "C": [6, np.nan, np.nan], "B": [np.nan, 4, 2.5]})  # doctest: +SKIP
1330     >>> funcs = {"A": ["max"], "C": ["max"], "B": ["mean", "min"]}
1331     >>> columns = ("foo", "aab", "bar", "dat")
1332     >>> order = [0, 1, 2, 3]
1333     >>> relabel_result(result, funcs, columns, order)
UNEXPECTED EXCEPTION: NameError("name 'result' is not defined")
Traceback (most recent call last):
  File "/home/runner/micromamba/envs/test/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pandas.core.apply.relabel_result[4]>", line 1, in <module>
NameError: name 'result' is not defined

I think you need to remove # doctest: +SKIP from a few lines above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the error!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kkangs0226! ... Is there any chance that you could run this command? :

./scripts/validate_docstrings.py pandas.core.apply.relabel_result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the output, it still flags as failed example but I think @MarcoGorelli has mentioned that this error is unrelated. It seems to be one of F821 undefined errors.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I meant that the Ubuntu / Numpy Dev (pull_request) CI job's failure is unrelated

this one looks related, you may need to put something like from pandas.core.apply import relabel_result at the top of the doctest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for trying - sure, I'll take a look tomorrow, hopefully there's a simple fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring of relabel_result has many errors. I see 16:
The summary should be in one line, the parameters and even the subtitles are incorrect. (They shouldn't have ":", it should be Examples, not Examples:) etc.

But at the beginning of the function it's written "Internal function...", and this is not on the documentation website.
Should we test internal functions docstrings?! @MarcoGorelli

Thank you @kkangs0226... We'll get there. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should fix the whole docstring here. Let's merge this when all the problems to the original PR scope are fixed, and open follow up PRs for anything else we want to fix.

@kkangs0226 can you provide the whole traceback for the error please? With only the error and the line you shared I don't know what the problem can be, and more context would be helpful.

Copy link
Contributor Author

@kkangs0226 kkangs0226 Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the entire traceback, hope it is able to provide more context.

Line 21, in pandas.core.apply.relabel_result
Failed example:
    relabel_result(result, funcs, columns, order)
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/doctest.py", line 1336, in __run
        exec(compile(example.source, filename, "single",
      File "<doctest pandas.core.apply.relabel_result[6]>", line 1, in <module>
        relabel_result(result, funcs, columns, order)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/apply.py", line 1377, in relabel_result
        s = s[col_idx_order]
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/series.py", line 968, in __getitem__
        return self._get_with(key)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/series.py", line 1003, in _get_with
        return self.loc[key]
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1100, in __getitem__
        return self._getitem_axis(maybe_callable, axis=axis)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1329, in _getitem_axis
        return self._getitem_iterable(key, axis=axis)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1269, in _getitem_iterable
        keyarr, indexer = self._get_listlike_indexer(key, axis)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1459, in _get_listlike_indexer
        keyarr, indexer = ax._get_indexer_strict(key, axis_name)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexes/base.py", line 5851, in _get_indexer_strict
        self._raise_if_missing(keyarr, indexer, axis_name)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexes/base.py", line 5910, in _raise_if_missing
        raise KeyError(f"None of [{key}] are in the [{axis_name}]")
    KeyError: "None of [Index([-1], dtype='int64')] are in the [index]"```

Copy link
Member

@MarcoGorelli MarcoGorelli Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, got it - I'll post a new review comment showing how I'd go about fixing this (as it's a bit long and easy to get lost in a thread)

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge this as is if the original issue being addressed is complete and the CI is green, and continue with other fixes in follow up PRs.

>>> _relabel_result(result, func, columns, order) # doctest: +SKIP
>>> relabel_result(result, funcs, columns, order)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should fix the whole docstring here. Let's merge this when all the problems to the original PR scope are fixed, and open follow up PRs for anything else we want to fix.

@kkangs0226 can you provide the whole traceback for the error please? With only the error and the line you shared I don't know what the problem can be, and more context would be helpful.

Comment on lines 5919 to 5923
>>> df.pipe(
... subtract_federal_tax
... ).pipe(
... subtract_state_tax, rate=0.12
... ).pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the parentheses are to avoid a flake8 error. But the examples should look as if you were using the interpreter. You wouldn’t be using parentheses in that case.

Does this comment refer to the parenthesis wrapping this whole block? If that's the case, I don't think this is true. The original syntax is much more common and readable than this in my opinion. This is what we've been doing in the docs that I know, it's how code was chained in the original post about method chaining, it's also the syntax that Matt Harrison uses in the Effective pandas book, and it's what I would write.

To be clear, I'm referring to using

(df.pipe(foo)
   .pipe(bar)
   .pipe(foobar)
)

instead of

df.pipe(
    foo
).pipe(
    bar
).pipe(
    foobar
)

I'm fine with whatever (even if I personally found the former much more readable), but I wouldn't ask people to change the the latter because the former is not what someone would write, because I think many people would actually write that. :)

@MarcoGorelli
Copy link
Member

The relabel_result doctest doesn't run. It's only an internal function, but still, having an example is useful for helping devs understand what it does. So perhaps we don't need to fix all linting errors in it, but having a running example would be good.

To get it running, I'd first look for an example of it being used in practice: for that, I went to https://pandas-coverage.herokuapp.com/ , looked for the file pandas/core/frame.py, and line 1980 (where relabel_result is used) and found that the following test hits it:

pandas/tests/apply/test_frame_apply_relabeling.py::test_agg_relabel_multi_columns_multi_methods

I then ran that test, with a breakpoint on line 1980, and saw that the aggregations need to be part of result's index

So, here's one way of writing this example such that it passes:

    >>> from pandas.core.apply import relabel_result
    >>> result = pd.DataFrame(
    ...     {"A": [np.nan, 2, np.nan], "C": [6, np.nan, np.nan], "B": [np.nan, 4, 2.5]},
    ...     index=["max", "mean", "min"]
    ... )
    >>> funcs = {"A": ["max"], "C": ["max"], "B": ["mean", "min"]}
    >>> columns = ("foo", "aab", "bar", "dat")
    >>> order = [0, 1, 2, 3]
    >>> result_in_dict = relabel_result(result, funcs, columns, order)
    >>> pd.DataFrame(result_in_dict, index=columns)
           A    C    B
    foo  2.0  NaN  NaN
    aab  NaN  6.0  NaN
    bar  NaN  NaN  4.0
    dat  NaN  NaN  2.5

@MarcoGorelli
Copy link
Member

Also, for me, the validate_docstrings doesn't check the doctest unless I change

pandas/pandas/core/apply.py

Lines 1326 to 1327 in 8a40960

Examples:
---------

to

    Examples
    --------

@MarcoGorelli MarcoGorelli added this to the 2.1 milestone Feb 28, 2023
@MarcoGorelli
Copy link
Member

I went ahead and fixed up the remaining tidbits, hope that's OK - thanks @kkangs0226 for your PR, and Dea for reviewing!

@MarcoGorelli MarcoGorelli merged commit b02728c into pandas-dev:main Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Set value for undefined variables in examples
5 participants