Skip to content

DOC: Set value for undefined variables in examples #51389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 28, 2023
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pandas/core/apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -1330,7 +1330,7 @@ def relabel_result(
>>> funcs = {"A": ["max"], "C": ["max"], "B": ["mean", "min"]}
>>> columns = ("foo", "aab", "bar", "dat")
>>> order = [0, 1, 2, 3]
>>> _relabel_result(result, func, columns, order) # doctest: +SKIP
>>> relabel_result(result, funcs, columns, order)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is throwing an error in CI:

=================================== FAILURES ===================================
__________________ [doctest] pandas.core.apply.relabel_result __________________
1324     order: New order for relabelling
1325 
1326     Examples:
1327     ---------
1328     >>> result = DataFrame({"A": [np.nan, 2, np.nan],
1329     ...       "C": [6, np.nan, np.nan], "B": [np.nan, 4, 2.5]})  # doctest: +SKIP
1330     >>> funcs = {"A": ["max"], "C": ["max"], "B": ["mean", "min"]}
1331     >>> columns = ("foo", "aab", "bar", "dat")
1332     >>> order = [0, 1, 2, 3]
1333     >>> relabel_result(result, funcs, columns, order)
UNEXPECTED EXCEPTION: NameError("name 'result' is not defined")
Traceback (most recent call last):
  File "/home/runner/micromamba/envs/test/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pandas.core.apply.relabel_result[4]>", line 1, in <module>
NameError: name 'result' is not defined

I think you need to remove # doctest: +SKIP from a few lines above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the error!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kkangs0226! ... Is there any chance that you could run this command? :

./scripts/validate_docstrings.py pandas.core.apply.relabel_result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the output, it still flags as failed example but I think @MarcoGorelli has mentioned that this error is unrelated. It seems to be one of F821 undefined errors.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I meant that the Ubuntu / Numpy Dev (pull_request) CI job's failure is unrelated

this one looks related, you may need to put something like from pandas.core.apply import relabel_result at the top of the doctest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for trying - sure, I'll take a look tomorrow, hopefully there's a simple fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring of relabel_result has many errors. I see 16:
The summary should be in one line, the parameters and even the subtitles are incorrect. (They shouldn't have ":", it should be Examples, not Examples:) etc.

But at the beginning of the function it's written "Internal function...", and this is not on the documentation website.
Should we test internal functions docstrings?! @MarcoGorelli

Thank you @kkangs0226... We'll get there. :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should fix the whole docstring here. Let's merge this when all the problems to the original PR scope are fixed, and open follow up PRs for anything else we want to fix.

@kkangs0226 can you provide the whole traceback for the error please? With only the error and the line you shared I don't know what the problem can be, and more context would be helpful.

Copy link
Contributor Author

@kkangs0226 kkangs0226 Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the entire traceback, hope it is able to provide more context.

Line 21, in pandas.core.apply.relabel_result
Failed example:
    relabel_result(result, funcs, columns, order)
Exception raised:
    Traceback (most recent call last):
      File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/doctest.py", line 1336, in __run
        exec(compile(example.source, filename, "single",
      File "<doctest pandas.core.apply.relabel_result[6]>", line 1, in <module>
        relabel_result(result, funcs, columns, order)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/apply.py", line 1377, in relabel_result
        s = s[col_idx_order]
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/series.py", line 968, in __getitem__
        return self._get_with(key)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/series.py", line 1003, in _get_with
        return self.loc[key]
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1100, in __getitem__
        return self._getitem_axis(maybe_callable, axis=axis)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1329, in _getitem_axis
        return self._getitem_iterable(key, axis=axis)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1269, in _getitem_iterable
        keyarr, indexer = self._get_listlike_indexer(key, axis)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexing.py", line 1459, in _get_listlike_indexer
        keyarr, indexer = ax._get_indexer_strict(key, axis_name)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexes/base.py", line 5851, in _get_indexer_strict
        self._raise_if_missing(keyarr, indexer, axis_name)
      File "/Users/kingbob/Desktop/NUS/3281:3282/pandas/pandas/core/indexes/base.py", line 5910, in _raise_if_missing
        raise KeyError(f"None of [{key}] are in the [{axis_name}]")
    KeyError: "None of [Index([-1], dtype='int64')] are in the [index]"```

Copy link
Member

@MarcoGorelli MarcoGorelli Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, got it - I'll post a new review comment showing how I'd go about fixing this (as it's a bit long and easy to get lost in a thread)

dict(A=Series([2.0, NaN, NaN, NaN], index=["foo", "aab", "bar", "dat"]),
C=Series([NaN, 6.0, NaN, NaN], index=["foo", "aab", "bar", "dat"]),
B=Series([NaN, NaN, 2.5, 4.0], index=["foo", "aab", "bar", "dat"]))
Expand Down
5 changes: 3 additions & 2 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,8 +258,9 @@ def _unbox_scalar(

Examples
--------
>>> self._unbox_scalar(Timedelta("10s")) # doctest: +SKIP
10000000000
>>> arr = pd.arrays.DatetimeArray(np.array(['1970-01-01'], 'datetime64[ns]'))
>>> arr._unbox_scalar(arr[0])
numpy.datetime64('1970-01-01T00:00:00.000000000')
"""
raise AbstractMethodError(self)

Expand Down
1 change: 1 addition & 0 deletions pandas/core/dtypes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,7 @@ def construct_from_string(
For extension dtypes with arguments the following may be an
adequate implementation.

>>> import re
>>> @classmethod
... def construct_from_string(cls, string):
... pattern = re.compile(r"^my_type\[(?P<arg_name>.+)\]$")
Expand Down
72 changes: 57 additions & 15 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2204,7 +2204,7 @@ def to_excel(

>>> with pd.ExcelWriter('output.xlsx',
... mode='a') as writer: # doctest: +SKIP
... df.to_excel(writer, sheet_name='Sheet_name_3')
... df1.to_excel(writer, sheet_name='Sheet_name_3')

To set the library that is used to write the Excel file,
you can pass the `engine` keyword (the default engine is
Expand Down Expand Up @@ -5864,9 +5864,9 @@ def pipe(
Alternatively a ``(callable, data_keyword)`` tuple where
``data_keyword`` is a string indicating the keyword of
``callable`` that expects the {klass}.
args : iterable, optional
*args : iterable, optional
Positional arguments passed into ``func``.
kwargs : mapping, optional
**kwargs : mapping, optional
A dictionary of keyword arguments passed into ``func``.

Returns
Expand All @@ -5883,25 +5883,67 @@ def pipe(
Notes
-----
Use ``.pipe`` when chaining together functions that expect
Series, DataFrames or GroupBy objects. Instead of writing
Series, DataFrames or GroupBy objects.

>>> func(g(h(df), arg1=a), arg2=b, arg3=c) # doctest: +SKIP
Examples
--------
Constructing a income DataFrame from a dictionary.

>>> data = [[8000, 1000], [9500, np.nan], [5000, 2000]]
>>> df = pd.DataFrame(data, columns=['Salary', 'Others'])
>>> df
Salary Others
0 8000 1000.0
1 9500 NaN
2 5000 2000.0

Functions that perform tax reductions on an income DataFrame.

>>> def subtract_federal_tax(df):
... return df * 0.9
>>> def subtract_state_tax(df, rate):
... return df * (1 - rate)
>>> def subtract_national_insurance(df, rate, rate_increase):
... new_rate = rate + rate_increase
... return df * (1 - new_rate)

Instead of writing

>>> subtract_national_insurance(
... subtract_state_tax(subtract_federal_tax(df),rate=0.12),
... rate=0.05,
... rate_increase=0.02) # doctest: +SKIP

You can write

>>> (df.pipe(h)
... .pipe(g, arg1=a)
... .pipe(func, arg2=b, arg3=c)
... ) # doctest: +SKIP
>>> df.pipe(
... subtract_federal_tax
... ).pipe(
... subtract_state_tax, rate=0.12
... ).pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the parentheses are to avoid a flake8 error. But the examples should look as if you were using the interpreter. You wouldn’t be using parentheses in that case.

Does this comment refer to the parenthesis wrapping this whole block? If that's the case, I don't think this is true. The original syntax is much more common and readable than this in my opinion. This is what we've been doing in the docs that I know, it's how code was chained in the original post about method chaining, it's also the syntax that Matt Harrison uses in the Effective pandas book, and it's what I would write.

To be clear, I'm referring to using

(df.pipe(foo)
   .pipe(bar)
   .pipe(foobar)
)

instead of

df.pipe(
    foo
).pipe(
    bar
).pipe(
    foobar
)

I'm fine with whatever (even if I personally found the former much more readable), but I wouldn't ask people to change the the latter because the former is not what someone would write, because I think many people would actually write that. :)

Salary Others
0 5892.48 736.56
1 6997.32 NaN
2 3682.80 1473.12

If you have a function that takes the data as (say) the second
argument, pass a tuple indicating which keyword expects the
data. For example, suppose ``func`` takes its data as ``arg2``:

>>> (df.pipe(h)
... .pipe(g, arg1=a)
... .pipe((func, 'arg2'), arg1=a, arg3=c)
... ) # doctest: +SKIP
data. For example, suppose ``national_insurance`` takes its data as ``df``
in the second argument:

>>> def subtract_national_insurance(rate, df, rate_increase):
... new_rate = rate + rate_increase
... return df * (1 - new_rate)
>>> df.pipe(
... subtract_federal_tax
... ).pipe(
... subtract_state_tax, rate=0.12
... ).pipe((subtract_national_insurance, 'df'),
... rate=0.05, rate_increase=0.02)
Salary Others
0 5892.48 736.56
1 6997.32 NaN
2 3682.80 1473.12
"""
if using_copy_on_write():
return common.pipe(self.copy(deep=None), func, *args, **kwargs)
Expand Down
2 changes: 1 addition & 1 deletion pandas/io/formats/style_render.py
Original file line number Diff line number Diff line change
Expand Up @@ -1272,7 +1272,7 @@ def format_index(

>>> df = pd.DataFrame([[1, 2, 3]],
... columns=pd.MultiIndex.from_arrays([["a", "a", "b"],[2, np.nan, 4]]))
>>> df.style.format_index({0: lambda v: upper(v)}, axis=1, precision=1)
>>> df.style.format_index({0: lambda v: v.upper()}, axis=1, precision=1)
... # doctest: +SKIP
A B
2.0 nan 4.0
Expand Down