DOC: Set value for undefined variables in examples #51389

kkangs0226 · 2023-02-14T18:25:10Z

closes DOC: Set value for undefined variables in examples #51342 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

kkangs0226 · 2023-02-14T18:30:59Z

There is 1 undefined error I have yet to fix, which is in line 262 of pandas/core/arrays/datetimelike.py.
.
This method is under DatetimeLikeArrayMixin class.

self is undefined. I would like to confirm whether I should instantiate this class and replace self with the object. The class takes in in other objects, which in turn take in other objects as well, so it would involve many classes that might only make the example more confusing.
@datapythonista Please let me know what you think!

MarcoGorelli

Thanks @kkangs0226 for working on this!

For unbox_scalar, I think instantiating some example of a class on which it works would be fine - you could add a note saying that it's just an example of an object it can work on

pandas/io/formats/style.py

pandas/core/generic.py

…value-undefined

MarcoGorelli

nice, getting there

pandas/core/generic.py

pandas/core/arrays/datetimelike.py

MarcoGorelli

Looks good to me pending comments from @datapythonista , thanks @kkangs0226 !

datapythonista

Really nice. I added couple of last suggestions, but great changes!

pandas/core/apply.py

pandas/core/generic.py

datapythonista · 2023-02-22T13:20:47Z

Fantastic work, merging on green.

DeaMariaLeon · 2023-02-25T09:59:11Z

@kkangs0226 thanks a lot for your work. I wonder if you could please make the resulting html file of the docstring, and share it here as pdf?

I think that we have to follow the documenation conventions (please see the topics regarding short names and numbers): https://pandas.pydata.org/docs/development/contributing_docstring.html#conventions-for-the-examples

We would also like to avoid extra parentheses.

cc: @MarcoGorelli

…value-undefined

…to set-value-undefined

kkangs0226 · 2023-02-25T16:11:45Z

@DeaMariaLeon
I have updated the variable name of the dataframe to df as suggested in the documenation conventions.
As for the extra parentheses for multi-line doctests, they are to avoid the F721 syntax error. More details can be found here: #51341

I have also attached the resulting html file for your viewing.
pandas_DataFrame_pipe.pdf

DeaMariaLeon · 2023-02-25T16:44:36Z

@kkangs0226 Thank you very much.

Yes, the parentheses are to avoid a flake8 error. But the examples should look as if you were using the interpreter. You wouldn’t be using parentheses in that case. I opened a PR #51634 after a PR you did (it hasn't finished the ci tests). It passes flake8 locally and it doesn't have the extra parentheses.

As for the names of the df, that's good. What I meant was also that the name of the functions are very long IMO. But if everybody is happy, that's good.

Again, thank you for all this.

kkangs0226 · 2023-02-26T03:51:15Z

@DeaMariaLeon Thanks for the clarifications, I have removed the extra parentheses!

As for the long function names, since they were suggested by @datapythonista , perhaps we can ask for his thoughts on it.

MarcoGorelli

This is close, thanks for sticking with it 🙌

There's just a CI error in the docstrings job (the other one is unrelated, don't worry about it)

MarcoGorelli · 2023-02-26T08:35:36Z

pandas/core/apply.py

-    >>> _relabel_result(result, func, columns, order)  # doctest: +SKIP
+    >>> relabel_result(result, funcs, columns, order)


this one is throwing an error in CI:

=================================== FAILURES =================================== __________________ [doctest] pandas.core.apply.relabel_result __________________ 1324 order: New order for relabelling 1325 1326 Examples: 1327 --------- 1328 >>> result = DataFrame({"A": [np.nan, 2, np.nan], 1329 ... "C": [6, np.nan, np.nan], "B": [np.nan, 4, 2.5]}) # doctest: +SKIP 1330 >>> funcs = {"A": ["max"], "C": ["max"], "B": ["mean", "min"]} 1331 >>> columns = ("foo", "aab", "bar", "dat") 1332 >>> order = [0, 1, 2, 3] 1333 >>> relabel_result(result, funcs, columns, order) UNEXPECTED EXCEPTION: NameError("name 'result' is not defined") Traceback (most recent call last): File "/home/runner/micromamba/envs/test/lib/python3.8/doctest.py", line 1336, in __run exec(compile(example.source, filename, "single", File "<doctest pandas.core.apply.relabel_result[4]>", line 1, in <module> NameError: name 'result' is not defined

I think you need to remove # doctest: +SKIP from a few lines above

Fixed the error!

Thanks @kkangs0226! ... Is there any chance that you could run this command? :

./scripts/validate_docstrings.py pandas.core.apply.relabel_result

Here is the output, it still flags as failed example but I think @MarcoGorelli has mentioned that this error is unrelated. It seems to be one of F821 undefined errors.

sorry I meant that the Ubuntu / Numpy Dev (pull_request) CI job's failure is unrelated

this one looks related, you may need to put something like from pandas.core.apply import relabel_result at the top of the doctest

thanks for trying - sure, I'll take a look tomorrow, hopefully there's a simple fix

The docstring of relabel_result has many errors. I see 16:
The summary should be in one line, the parameters and even the subtitles are incorrect. (They shouldn't have ":", it should be Examples, not Examples:) etc.

But at the beginning of the function it's written "Internal function...", and this is not on the documentation website.
Should we test internal functions docstrings?! @MarcoGorelli

Thank you @kkangs0226... We'll get there. :-)

I don't think we should fix the whole docstring here. Let's merge this when all the problems to the original PR scope are fixed, and open follow up PRs for anything else we want to fix.

@kkangs0226 can you provide the whole traceback for the error please? With only the error and the line you shared I don't know what the problem can be, and more context would be helpful.

Here is the entire traceback, hope it is able to provide more context.

Line 21, in pandas.core.apply.relabel_result Failed example: relabel_result(result, funcs, columns, order) Exception raised: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/doctest.py", line 1336, in __run exec(compile(example.source, filename, "single", File "<doctest pandas.core.apply.relabel_result[6]>", line 1, in <module> relabel_result(result, funcs, columns, order) File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/apply.py", line 1377, in relabel_result s = s[col_idx_order] File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/series.py", line 968, in __getitem__ return self._get_with(key) File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/series.py", line 1003, in _get_with return self.loc[key] File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1100, in __getitem__ return self._getitem_axis(maybe_callable, axis=axis) File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1329, in _getitem_axis return self._getitem_iterable(key, axis=axis) File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1269, in _getitem_iterable keyarr, indexer = self._get_listlike_indexer(key, axis) File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1459, in _get_listlike_indexer keyarr, indexer = ax._get_indexer_strict(key, axis_name) File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexes/base.py", line 5851, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexes/base.py", line 5910, in _raise_if_missing raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index([-1], dtype='int64')] are in the [index]"```

ok, got it - I'll post a new review comment showing how I'd go about fixing this (as it's a bit long and easy to get lost in a thread)

datapythonista

Happy to merge this as is if the original issue being addressed is complete and the CI is green, and continue with other fixes in follow up PRs.

datapythonista · 2023-02-26T17:18:54Z

pandas/core/apply.py

-    >>> _relabel_result(result, func, columns, order)  # doctest: +SKIP
+    >>> relabel_result(result, funcs, columns, order)


I don't think we should fix the whole docstring here. Let's merge this when all the problems to the original PR scope are fixed, and open follow up PRs for anything else we want to fix.

@kkangs0226 can you provide the whole traceback for the error please? With only the error and the line you shared I don't know what the problem can be, and more context would be helpful.

datapythonista · 2023-02-26T17:28:40Z

pandas/core/generic.py

+        >>> df.pipe(
+        ...     subtract_federal_tax
+        ... ).pipe(
+        ...     subtract_state_tax, rate=0.12
+        ... ).pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02)


Yes, the parentheses are to avoid a flake8 error. But the examples should look as if you were using the interpreter. You wouldn’t be using parentheses in that case.

Does this comment refer to the parenthesis wrapping this whole block? If that's the case, I don't think this is true. The original syntax is much more common and readable than this in my opinion. This is what we've been doing in the docs that I know, it's how code was chained in the original post about method chaining, it's also the syntax that Matt Harrison uses in the Effective pandas book, and it's what I would write.

To be clear, I'm referring to using

(df.pipe(foo) .pipe(bar) .pipe(foobar) )

instead of

df.pipe( foo ).pipe( bar ).pipe( foobar )

I'm fine with whatever (even if I personally found the former much more readable), but I wouldn't ask people to change the the latter because the former is not what someone would write, because I think many people would actually write that. :)

MarcoGorelli · 2023-02-27T08:09:51Z

The relabel_result doctest doesn't run. It's only an internal function, but still, having an example is useful for helping devs understand what it does. So perhaps we don't need to fix all linting errors in it, but having a running example would be good.

To get it running, I'd first look for an example of it being used in practice: for that, I went to https://pandas-coverage.herokuapp.com/ , looked for the file pandas/core/frame.py, and line 1980 (where relabel_result is used) and found that the following test hits it:

pandas/tests/apply/test_frame_apply_relabeling.py::test_agg_relabel_multi_columns_multi_methods

I then ran that test, with a breakpoint on line 1980, and saw that the aggregations need to be part of result's index

So, here's one way of writing this example such that it passes:

    >>> from pandas.core.apply import relabel_result
    >>> result = pd.DataFrame(
    ...     {"A": [np.nan, 2, np.nan], "C": [6, np.nan, np.nan], "B": [np.nan, 4, 2.5]},
    ...     index=["max", "mean", "min"]
    ... )
    >>> funcs = {"A": ["max"], "C": ["max"], "B": ["mean", "min"]}
    >>> columns = ("foo", "aab", "bar", "dat")
    >>> order = [0, 1, 2, 3]
    >>> result_in_dict = relabel_result(result, funcs, columns, order)
    >>> pd.DataFrame(result_in_dict, index=columns)
           A    C    B
    foo  2.0  NaN  NaN
    aab  NaN  6.0  NaN
    bar  NaN  NaN  4.0
    dat  NaN  NaN  2.5

MarcoGorelli · 2023-02-27T08:14:33Z

Also, for me, the validate_docstrings doesn't check the doctest unless I change

pandas/pandas/core/apply.py

Lines 1326 to 1327 in 8a40960

    
               Examples: 
        
               ---------

to

    Examples
    --------

…alue-undefined

MarcoGorelli · 2023-02-28T13:32:01Z

I went ahead and fixed up the remaining tidbits, hope that's OK - thanks @kkangs0226 for your PR, and Dea for reviewing!

Set value for undefined variables

3d5acdc

mroeschke added the Docs label Feb 14, 2023

MarcoGorelli self-requested a review February 14, 2023 21:23

MarcoGorelli suggested changes Feb 15, 2023

View reviewed changes

pandas/io/formats/style.py Outdated Show resolved Hide resolved

pandas/core/generic.py Outdated Show resolved Hide resolved

kkangs0226 and others added 4 commits February 16, 2023 09:53

Update example for pipe

44b2179

Merge branch 'main' of https://github.com/pandas-dev/pandas into set-…

6f61c67

…value-undefined

Merge branch 'main' into set-value-undefined

8e0ff34

Merge branch 'main' into set-value-undefined

dc21e5e

MarcoGorelli suggested changes Feb 20, 2023

View reviewed changes

pandas/core/generic.py Outdated Show resolved Hide resolved

pandas/core/arrays/datetimelike.py Outdated Show resolved Hide resolved

kkangs0226 and others added 3 commits February 20, 2023 22:28

Update generic.py

dc1722f

Update _unbox_scalar example

7104f8b

Merge branch 'main' into set-value-undefined

b3b215e

MarcoGorelli approved these changes Feb 20, 2023

View reviewed changes

datapythonista approved these changes Feb 21, 2023

View reviewed changes

pandas/core/apply.py Outdated Show resolved Hide resolved

pandas/core/generic.py Outdated Show resolved Hide resolved

kkangs0226 and others added 2 commits February 22, 2023 19:59

Update indentation

f115c0d

Merge branch 'main' into set-value-undefined

b7e949f

kkangs0226 added 3 commits February 25, 2023 23:46

Update dataframe name

eec2611

Merge branch 'main' of https://github.com/pandas-dev/pandas into set-…

40e6ccf

…value-undefined

Merge branch 'set-value-undefined' of github.com:kkangs0226/pandas in…

6be5d9c

…to set-value-undefined

Fix indentation

0032da4

Remove extra parentheses

c79bfda

MarcoGorelli suggested changes Feb 26, 2023

View reviewed changes

datapythonista approved these changes Feb 26, 2023

View reviewed changes

MarcoGorelli mentioned this pull request Feb 26, 2023

DOC: Set value for undefined variables in examples #51342

Closed

MarcoGorelli added 2 commits February 28, 2023 10:52

Merge remote-tracking branch 'upstream/main' into pr/kkangs0226/set-v…

176b536

…alue-undefined

fixup relabel_result, restore chained method syntax

3384a7f

MarcoGorelli added this to the 2.1 milestone Feb 28, 2023

MarcoGorelli approved these changes Feb 28, 2023

View reviewed changes

MarcoGorelli merged commit b02728c into pandas-dev:main Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Set value for undefined variables in examples #51389

DOC: Set value for undefined variables in examples #51389

kkangs0226 commented Feb 14, 2023

kkangs0226 commented Feb 14, 2023 •

edited

Loading

MarcoGorelli left a comment

MarcoGorelli left a comment

MarcoGorelli left a comment

datapythonista left a comment

datapythonista commented Feb 22, 2023

DeaMariaLeon commented Feb 25, 2023

kkangs0226 commented Feb 25, 2023

DeaMariaLeon commented Feb 25, 2023

kkangs0226 commented Feb 26, 2023

MarcoGorelli left a comment

MarcoGorelli Feb 26, 2023

kkangs0226 Feb 26, 2023

DeaMariaLeon Feb 26, 2023

kkangs0226 Feb 26, 2023

MarcoGorelli Feb 26, 2023

MarcoGorelli Feb 26, 2023

DeaMariaLeon Feb 26, 2023

datapythonista Feb 26, 2023

kkangs0226 Feb 27, 2023 •

edited

Loading

MarcoGorelli Feb 27, 2023 •

edited

Loading

datapythonista left a comment

datapythonista Feb 26, 2023

datapythonista Feb 26, 2023

MarcoGorelli commented Feb 27, 2023

MarcoGorelli commented Feb 27, 2023

MarcoGorelli commented Feb 28, 2023

		>>> _relabel_result(result, func, columns, order) # doctest: +SKIP
		>>> relabel_result(result, funcs, columns, order)

DOC: Set value for undefined variables in examples #51389

DOC: Set value for undefined variables in examples #51389

Conversation

kkangs0226 commented Feb 14, 2023

kkangs0226 commented Feb 14, 2023 • edited Loading

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

datapythonista commented Feb 22, 2023

DeaMariaLeon commented Feb 25, 2023

kkangs0226 commented Feb 25, 2023

DeaMariaLeon commented Feb 25, 2023

kkangs0226 commented Feb 26, 2023

MarcoGorelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kkangs0226 Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

MarcoGorelli Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcoGorelli commented Feb 27, 2023

MarcoGorelli commented Feb 27, 2023

MarcoGorelli commented Feb 28, 2023

kkangs0226 commented Feb 14, 2023 •

edited

Loading

kkangs0226 Feb 27, 2023 •

edited

Loading

MarcoGorelli Feb 27, 2023 •

edited

Loading