Skip to content

STYLE: Fix errors in doctests #51356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 16, 2023
Merged

Conversation

kkangs0226
Copy link
Contributor

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kkangs0226 for working on this!

Does this solve any of the EX02 errors? Could you try removing the functions you've fixed up from

pandas/ci/code_checks.sh

Lines 578 to 614 in c4caed6

MSG='Partially validate docstrings (EX02)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX02 --ignore_functions \
pandas.DataFrame.plot.line \
pandas.Series.plot.line \
pandas.Series.sparse.density \
pandas.Series.sparse.npoints \
pandas.Series.sparse.sp_values \
pandas.Timestamp.fromtimestamp \
pandas.api.types.infer_dtype \
pandas.api.types.is_datetime64_any_dtype \
pandas.api.types.is_datetime64_ns_dtype \
pandas.api.types.is_datetime64tz_dtype \
pandas.api.types.is_integer_dtype \
pandas.api.types.is_interval_dtype \
pandas.api.types.is_period_dtype \
pandas.api.types.is_signed_integer_dtype \
pandas.api.types.is_sparse \
pandas.api.types.is_string_dtype \
pandas.api.types.is_timedelta64_dtype \
pandas.api.types.is_timedelta64_ns_dtype \
pandas.api.types.is_unsigned_integer_dtype \
pandas.core.groupby.DataFrameGroupBy.take \
pandas.core.groupby.SeriesGroupBy.take \
pandas.io.formats.style.Styler.concat \
pandas.io.formats.style.Styler.export \
pandas.io.formats.style.Styler.set_td_classes \
pandas.io.formats.style.Styler.use \
pandas.io.json.build_table_schema \
pandas.merge_ordered \
pandas.option_context \
pandas.plotting.andrews_curves \
pandas.plotting.autocorrelation_plot \
pandas.plotting.lag_plot \
pandas.plotting.parallel_coordinates \
pandas.plotting.radviz \
pandas.tseries.frequencies.to_offset
RET=$(($RET + $?)) ; echo $MSG "DONE"

please and checking whether ./ci/code_checks.sh docstrings still passes? You may want to comment-out the non-EX02 part (currently, lines 82-576)

setup.cfg Outdated
Comment on lines 28 to 29
# undefined name 'pd' error flooding logs, ignore temporarily
F821
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is temporary, I guess it shouldn't be commited?

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @kkangs0226, great job. I added some comments, mostly related to templates, besides Marco's comments.

If the template issues are more complex, feel free to revert your changes to those here, we can merge all the other fixes in this PR for now, and you can work in a separate PR in that.

@@ -1871,17 +1871,17 @@ def apply_index(
>>> df = pd.DataFrame([[1,2], [3,4]], index=["A", "B"])
>>> def color_b(s):
... return {ret}
>>> df.style.{this}_index(color_b) # doctest: +SKIP
>>> df.style.apply_index(color_b) # doctest: +SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit more complex than other docstrings, since it's a template that is reused in different docstrings, and things like {this} are variables that are replaced by values before the docstring is being used.

See for example how {this} is not displayed in the final documentation, but a vaue is replaced with: https://pandas.pydata.org/docs/dev/reference/api/pandas.io.formats.style.Styler.apply_index.html

I don't think those should be a problem, since the doctests should run on the rendered docstring and not the template. Can you revert and see if something else is causing the error. Maybe I'm wrong, and the problem is that we run the doctests in the docstring. If that's the case, let's better revert the changes, and open a separate issue for that.

If you want to see what's the rendered docstring, you should be able to get it with something like help(pandas.DataFrame.style.apply_index).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this issue further and here are my findings:

Using the help function, I confirmed that the docstrings are correctly rendered.

So I created a test file under pandas, using the @doc decorator to test how the template strings are being tested under flake8.
image
image

From the screenshot it seems like when these template strings are in the first line of the doctest i.e. line starting with >>>, it is flagged as a syntax error. The succeeding lines are not flagged as syntax errors, explaining why the other parts of the doctest using string interpolation such as return {ret} are OK.

My guess is that perhaps flake8 identifies the doctests chunks & runs the syntax checking before the docstrings are rendered, and there is perhaps a bug (or perhaps there are other reasons) where the string interpolation is flagged as a syntax error for the first line.

I tried to search if others were experiencing similar issues online, but was unable to find more information.

@datapythonista Let me know your thoughts!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the detailed research and explanation @kkangs0226. I find it surprising that pytest would extract the docstrings statically instead of just running the code and using the __doc__ attribute. But your test shows that it's exactly what's happening.

As @MarcoGorelli suggests, I think better to revert the changes related to template variables for now, and maybe open an issue with the problem. So, we can merge your other changes here, and have a discussion on what we want to do with the template problem. If you create the issue, feel free to get the permalink to your comment above and reference it in the issue, since it provides a very clear explanation of the problem and what's going on.


.. figure:: ../../_static/style/{image_prefix}_axNone.png

Compress the color map from the both ``low`` and ``high`` ends

>>> df.style.{name}_gradient(axis=None, low=0.75, high=1.0) # doctest: +SKIP
>>> df.style.background_gradient(axis=None,
... low=0.75, high=1.0) # doctest: +SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to indent low with an extra space. Also, I'd probably move high to a different line too.


.. figure:: ../../_static/style/{image_prefix}_axNone_lowhigh.png

Manually setting ``vmin`` and ``vmax`` gradient thresholds

>>> df.style.{name}_gradient(axis=None, vmin=6.7, vmax=21.6) # doctest: +SKIP
>>> df.style.background_gradient(axis=None,
... vmin=6.7, vmax=21.6) # doctest: +SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@datapythonista datapythonista added Testing pandas testing functions or related to the test suite Docs Code Style Code style, linting, code_checks labels Feb 13, 2023
@kkangs0226
Copy link
Contributor Author

@MarcoGorelli I looked through the code_checks.sh file and it seems like none of the functions edited in my PR are in the script. I ran ./ci/code_checks.sh docstrings and here is the output:
image

@MarcoGorelli
Copy link
Member

@MarcoGorelli I looked through the code_checks.sh file and it seems like none of the functions edited in my PR are in the script.

thanks for checking - OK yeah, looks like EX03 isn't in the list of codes which are currently being checked, so that's why they wouldn't be showing up. That's another one we'll need to gradually enable, but your PR helps reduce the amount of work that'll need doing, thanks 🙌

for the {this} substitution, I'd suggest just restoring it - if flake8 is erroring, it's because it just checks the code statically rather than trying to run it, so it doesn't know what the symbol means. We can ignore it

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kkangs0226, very nice, merging when CI is green.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, thanks @kkangs0226

@kkangs0226
Copy link
Contributor Author

@datapythonista @MarcoGorelli Thanks for your reviews, I have created an issue for the template string problem and linked my comment there. #51377

@MarcoGorelli MarcoGorelli added this to the 2.0 milestone Feb 16, 2023
@MarcoGorelli
Copy link
Member

thanks @kkangs0226

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Style Code style, linting, code_checks Docs Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

STYLE: Fix errors in doctests
3 participants