Skip to content

STYLE: Fix errors in doctests #51356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 16, 2023
2 changes: 1 addition & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3334,7 +3334,7 @@ def to_latex(
>>> print(df.to_latex(index=False,
... formatters={"name": str.upper},
... float_format="{:.1f}".format,
... ) # doctest: +SKIP
... )) # doctest: +SKIP
\begin{tabular}{lrr}
\toprule
name & age & height \\
Expand Down
10 changes: 5 additions & 5 deletions pandas/errors/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -455,11 +455,11 @@ class CSSWarning(UserWarning):
Examples
--------
>>> df = pd.DataFrame({'A': [1, 1, 1]})
>>> df.style.applymap(lambda x: 'background-color: blueGreenRed;')
... .to_excel('styled.xlsx') # doctest: +SKIP
>>> (df.style.applymap(lambda x: 'background-color: blueGreenRed;')
... .to_excel('styled.xlsx')) # doctest: +SKIP
... # CSSWarning: Unhandled color format: 'blueGreenRed'
>>> df.style.applymap(lambda x: 'border: 1px solid red red;')
... .to_excel('styled.xlsx') # doctest: +SKIP
>>> (df.style.applymap(lambda x: 'border: 1px solid red red;')
... .to_excel('styled.xlsx')) # doctest: +SKIP
... # CSSWarning: Too many tokens provided to "border" (expected 1-3)
"""

Expand Down Expand Up @@ -569,7 +569,7 @@ class CategoricalConversionWarning(Warning):
>>> from pandas.io.stata import StataReader
>>> with StataReader('dta_file', chunksize=2) as reader: # doctest: +SKIP
... for i, block in enumerate(reader):
... print(i, block))
... print(i, block)
... # CategoricalConversionWarning: One or more series with value labels...
"""

Expand Down
32 changes: 17 additions & 15 deletions pandas/io/formats/style.py
Original file line number Diff line number Diff line change
Expand Up @@ -962,12 +962,12 @@ def to_latex(
Second we will format the display and, since our table is quite wide, will
hide the repeated level-0 of the index:

>>> styler.format(subset="Equity", precision=2)
>>> (styler.format(subset="Equity", precision=2)
... .format(subset="Stats", precision=1, thousands=",")
... .format(subset="Rating", formatter=str.upper)
... .format_index(escape="latex", axis=1)
... .format_index(escape="latex", axis=0)
... .hide(level=0, axis=0) # doctest: +SKIP
... .hide(level=0, axis=0)) # doctest: +SKIP

Note that one of the string entries of the index and column headers is "H&M".
Without applying the `escape="latex"` option to the `format_index` method the
Expand All @@ -983,8 +983,8 @@ def to_latex(
... elif v == "Sell": color = "#ff5933"
... else: color = "#ffdd33"
... return f"color: {color}; font-weight: bold;"
>>> styler.background_gradient(cmap="inferno", subset="Equity", vmin=0, vmax=1)
... .applymap(rating_color, subset="Rating") # doctest: +SKIP
>>> (styler.background_gradient(cmap="inferno", subset="Equity", vmin=0, vmax=1)
... .applymap(rating_color, subset="Rating")) # doctest: +SKIP

All the above styles will work with HTML (see below) and LaTeX upon conversion:

Expand Down Expand Up @@ -1871,17 +1871,17 @@ def apply_index(
>>> df = pd.DataFrame([[1,2], [3,4]], index=["A", "B"])
>>> def color_b(s):
... return {ret}
>>> df.style.{this}_index(color_b) # doctest: +SKIP
>>> df.style.apply_index(color_b) # doctest: +SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit more complex than other docstrings, since it's a template that is reused in different docstrings, and things like {this} are variables that are replaced by values before the docstring is being used.

See for example how {this} is not displayed in the final documentation, but a vaue is replaced with: https://pandas.pydata.org/docs/dev/reference/api/pandas.io.formats.style.Styler.apply_index.html

I don't think those should be a problem, since the doctests should run on the rendered docstring and not the template. Can you revert and see if something else is causing the error. Maybe I'm wrong, and the problem is that we run the doctests in the docstring. If that's the case, let's better revert the changes, and open a separate issue for that.

If you want to see what's the rendered docstring, you should be able to get it with something like help(pandas.DataFrame.style.apply_index).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this issue further and here are my findings:

Using the help function, I confirmed that the docstrings are correctly rendered.

So I created a test file under pandas, using the @doc decorator to test how the template strings are being tested under flake8.
image
image

From the screenshot it seems like when these template strings are in the first line of the doctest i.e. line starting with >>>, it is flagged as a syntax error. The succeeding lines are not flagged as syntax errors, explaining why the other parts of the doctest using string interpolation such as return {ret} are OK.

My guess is that perhaps flake8 identifies the doctests chunks & runs the syntax checking before the docstrings are rendered, and there is perhaps a bug (or perhaps there are other reasons) where the string interpolation is flagged as a syntax error for the first line.

I tried to search if others were experiencing similar issues online, but was unable to find more information.

@datapythonista Let me know your thoughts!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the detailed research and explanation @kkangs0226. I find it surprising that pytest would extract the docstrings statically instead of just running the code and using the __doc__ attribute. But your test shows that it's exactly what's happening.

As @MarcoGorelli suggests, I think better to revert the changes related to template variables for now, and maybe open an issue with the problem. So, we can merge your other changes here, and have a discussion on what we want to do with the template problem. If you create the issue, feel free to get the permalink to your comment above and reference it in the issue, since it provides a very clear explanation of the problem and what's going on.


.. figure:: ../../_static/style/appmaphead1.png

Selectively applying to specific levels of MultiIndex columns.

>>> midx = pd.MultiIndex.from_product([['ix', 'jy'], [0, 1], ['x3', 'z4']])
>>> df = pd.DataFrame([np.arange(8)], columns=midx)
>>> def highlight_x({var}):
>>> def highlight_x(s):
... return {ret2}
>>> df.style.{this}_index(highlight_x, axis="columns", level=[0, 2])
>>> df.style.apply_index(highlight_x, axis="columns", level=[0, 2])
... # doctest: +SKIP

.. figure:: ../../_static/style/appmaphead2.png
Expand Down Expand Up @@ -2784,31 +2784,33 @@ def background_gradient(

Shading the values column-wise, with ``axis=0``, preselecting numeric columns

>>> df.style.{name}_gradient(axis=0) # doctest: +SKIP
>>> df.style.background_gradient(axis=0) # doctest: +SKIP

.. figure:: ../../_static/style/{image_prefix}_ax0.png

Shading all values collectively using ``axis=None``

>>> df.style.{name}_gradient(axis=None) # doctest: +SKIP
>>> df.style.background_gradient(axis=None) # doctest: +SKIP

.. figure:: ../../_static/style/{image_prefix}_axNone.png

Compress the color map from the both ``low`` and ``high`` ends

>>> df.style.{name}_gradient(axis=None, low=0.75, high=1.0) # doctest: +SKIP
>>> df.style.background_gradient(axis=None,
... low=0.75, high=1.0) # doctest: +SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to indent low with an extra space. Also, I'd probably move high to a different line too.


.. figure:: ../../_static/style/{image_prefix}_axNone_lowhigh.png

Manually setting ``vmin`` and ``vmax`` gradient thresholds

>>> df.style.{name}_gradient(axis=None, vmin=6.7, vmax=21.6) # doctest: +SKIP
>>> df.style.background_gradient(axis=None,
... vmin=6.7, vmax=21.6) # doctest: +SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


.. figure:: ../../_static/style/{image_prefix}_axNone_vminvmax.png

Setting a ``gmap`` and applying to all columns with another ``cmap``

>>> df.style.{name}_gradient(axis=0, gmap=df['Temp (c)'], cmap='YlOrRd')
>>> df.style.background_gradient(axis=0, gmap=df['Temp (c)'], cmap='YlOrRd')
... # doctest: +SKIP

.. figure:: ../../_static/style/{image_prefix}_gmap.png
Expand All @@ -2817,7 +2819,7 @@ def background_gradient(
explicitly state ``subset`` to match the ``gmap`` shape

>>> gmap = np.array([[1,2,3], [2,3,4], [3,4,5]])
>>> df.style.{name}_gradient(axis=None, gmap=gmap,
>>> df.style.background_gradient(axis=None, gmap=gmap,
... cmap='YlOrRd', subset=['Temp (c)', 'Rain (mm)', 'Wind (m/s)']
... ) # doctest: +SKIP

Expand Down Expand Up @@ -3504,9 +3506,9 @@ def pipe(self, func: Callable, *args, **kwargs):
Since the method returns a ``Styler`` object it can be chained with other
methods as if applying the underlying highlighters directly.

>>> df.style.format("{:.1f}")
>>> (df.style.format("{:.1f}")
... .pipe(some_highlights, min_color="green")
... .highlight_between(left=2, right=5) # doctest: +SKIP
... .highlight_between(left=2, right=5)) # doctest: +SKIP

.. figure:: ../../_static/style/df_pipe_hl2.png

Expand Down
4 changes: 2 additions & 2 deletions pandas/io/formats/style_render.py
Original file line number Diff line number Diff line change
Expand Up @@ -1077,8 +1077,8 @@ def format(
Multiple ``na_rep`` or ``precision`` specifications under the default
``formatter``.

>>> df.style.format(na_rep='MISS', precision=1, subset=[0])
... .format(na_rep='PASS', precision=2, subset=[1, 2]) # doctest: +SKIP
>>> (df.style.format(na_rep='MISS', precision=1, subset=[0])
... .format(na_rep='PASS', precision=2, subset=[1, 2])) # doctest: +SKIP
0 1 2
0 MISS 1.00 A
1 2.0 PASS 3.00
Expand Down
4 changes: 3 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ ignore =
# Use "collections.abc.*" instead of "typing.*" (PEP 585 syntax)
Y027,
# while int | float can be shortened to float, the former is more explicit
Y041
Y041,
# undefined name 'pd' error flooding logs, ignore temporarily
F821
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is temporary, I guess it shouldn't be commited?

exclude =
doc/sphinxext/*.py,
doc/build/*.py,
Expand Down