ENH: Styler.format_index() to display index values similarly to data-values with format() (#43101)

attack68 · web-flow · commit 4081fc71cb9d · 2021-09-17T00:16:22.000+02:00
Co-authored-by: JHM Darbyshire (iMac) &lt;attack68@users.noreply.github.com&gt;
diff --git a/doc/source/reference/style.rst b/doc/source/reference/style.rst
@@ -39,6 +39,7 @@ Style application
    Styler.apply_index
    Styler.applymap_index
    Styler.format
+   Styler.format_index
    Styler.hide_index
    Styler.hide_columns
    Styler.set_td_classes
diff --git a/doc/source/user_guide/style.ipynb b/doc/source/user_guide/style.ipynb
@@ -150,15 +150,14 @@
     "\n",
     "### Formatting Values\n",
     "\n",
-    "Before adding styles it is useful to show that the [Styler][styler] can distinguish the *display* value from the *actual* value. To control the display value, the text is printed in each cell, and we can use the [.format()][formatfunc] method to manipulate this according to a [format spec string][format] or a callable that takes a single value and returns a string. It is possible to define this for the whole table or for individual columns. \n",
+    "Before adding styles it is useful to show that the [Styler][styler] can distinguish the *display* value from the *actual* value, in both datavlaues and index or columns headers. To control the display value, the text is printed in each cell as string, and we can use the [.format()][formatfunc] and [.format_index()][formatfuncindex] methods to manipulate this according to a [format spec string][format] or a callable that takes a single value and returns a string. It is possible to define this for the whole table, or index, or for individual columns, or MultiIndex levels. \n",
     "\n",
-    "Additionally, the format function has a **precision** argument to specifically help formatting floats, as well as **decimal** and **thousands** separators to support other locales, an **na_rep** argument to display missing data, and an **escape** argument to help displaying safe-HTML or safe-LaTeX. The default formatter is configured to adopt pandas' regular `display.precision` option, controllable using `with pd.option_context('display.precision', 2):`\n",
-    "\n",
-    "Here is an example of using the multiple options to control the formatting generally and with specific column formatters.\n",
+    "Additionally, the format function has a **precision** argument to specifically help formatting floats, as well as **decimal** and **thousands** separators to support other locales, an **na_rep** argument to display missing data, and an **escape** argument to help displaying safe-HTML or safe-LaTeX. The default formatter is configured to adopt pandas' regular `display.precision` option, controllable using `with pd.option_context('display.precision', 2):` \n",
     "\n",
     "[styler]: ../reference/api/pandas.io.formats.style.Styler.rst\n",
     "[format]: https://docs.python.org/3/library/string.html#format-specification-mini-language\n",
-    "[formatfunc]: ../reference/api/pandas.io.formats.style.Styler.format.rst"
+    "[formatfunc]: ../reference/api/pandas.io.formats.style.Styler.format.rst\n",
+    "[formatfuncindex]: ../reference/api/pandas.io.formats.style.Styler.format_index.rst"
    ]
   },
   {
@@ -173,6 +172,49 @@
     "                          })"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Using Styler to manipulate the display is a useful feature because maintaining the indexing and datavalues for other purposes gives greater control. You do not have to overwrite your DataFrame to display it how you like. Here is an example of using the formatting functions whilst still relying on the underlying data for indexing and calculations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "weather_df = pd.DataFrame(np.random.rand(10,2)*5, \n",
+    "                          index=pd.date_range(start=\"2021-01-01\", periods=10),\n",
+    "                          columns=[\"Tokyo\", \"Beijing\"])\n",
+    "\n",
+    "def rain_condition(v): \n",
+    "    if v < 1.75:\n",
+    "        return \"Dry\"\n",
+    "    elif v < 2.75:\n",
+    "        return \"Rain\"\n",
+    "    return \"Heavy Rain\"\n",
+    "\n",
+    "def make_pretty(styler):\n",
+    "    styler.set_caption(\"Weather Conditions\")\n",
+    "    styler.format(rain_condition)\n",
+    "    styler.format_index(lambda v: v.strftime(\"%A\"))\n",
+    "    styler.background_gradient(axis=None, vmin=1, vmax=5, cmap=\"YlGnBu\")\n",
+    "    return styler\n",
+    "\n",
+    "weather_df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "weather_df.loc[\"2021-01-04\":\"2021-01-08\"].style.pipe(make_pretty)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -187,7 +229,7 @@
     "\n",
     "Hiding does not change the integer arrangement of CSS classes, e.g. hiding the first two columns of a DataFrame means the column class indexing will start at `col2`, since `col0` and `col1` are simply ignored.\n",
     "\n",
-    "We can update our `Styler` object to hide some data and format the values.\n",
+    "We can update our `Styler` object from before to hide some data and format the values.\n",
     "\n",
     "[hideidx]: ../reference/api/pandas.io.formats.style.Styler.hide_index.rst\n",
     "[hidecols]: ../reference/api/pandas.io.formats.style.Styler.hide_columns.rst"
@@ -1974,7 +2016,6 @@
   }
  ],
  "metadata": {
-  "celltoolbar": "Edit Metadata",
   "kernelspec": {
    "display_name": "Python 3",
    "language": "python",
diff --git a/doc/source/whatsnew/v1.4.0.rst b/doc/source/whatsnew/v1.4.0.rst
@@ -72,7 +72,7 @@ Styler
 
 :class:`.Styler` has been further developed in 1.4.0. The following enhancements have been made:
 
-  - Styling of indexing has been added, with :meth:`.Styler.apply_index` and :meth:`.Styler.applymap_index`. These mirror the signature of the methods already used to style data values, and work with both HTML and LaTeX format (:issue:`41893`).
+  - Styling and formatting of indexes has been added, with :meth:`.Styler.apply_index`, :meth:`.Styler.applymap_index` and :meth:`.Styler.format_index`. These mirror the signature of the methods already used to style and format data values, and work with both HTML and LaTeX format (:issue:`41893`, :issue:`43101`).
   - :meth:`.Styler.bar` introduces additional arguments to control alignment and display (:issue:`26070`, :issue:`36419`), and it also validates the input arguments ``width`` and ``height`` (:issue:`42511`).
   - :meth:`.Styler.to_latex` introduces keyword argument ``environment``, which also allows a specific "longtable" entry through a separate jinja2 template (:issue:`41866`).
   - :meth:`.Styler.to_html` introduces keyword arguments ``sparse_index``, ``sparse_columns``, ``bold_headers``, ``caption``, ``max_rows`` and ``max_columns`` (:issue:`41946`, :issue:`43149`, :issue:`42972`).
diff --git a/pandas/io/formats/style.py b/pandas/io/formats/style.py
@@ -1184,6 +1184,8 @@ def _copy(self, deepcopy: bool = False) -> Styler:
         ]
         deep = [  # nested lists or dicts
             "_display_funcs",
+            "_display_funcs_index",
+            "_display_funcs_columns",
             "hidden_rows",
             "hidden_columns",
             "ctx",
diff --git a/pandas/io/formats/style_render.py b/pandas/io/formats/style_render.py
@@ -117,6 +117,12 @@ def __init__(
         self._display_funcs: DefaultDict[  # maps (row, col) -> format func
             tuple[int, int], Callable[[Any], str]
         ] = defaultdict(lambda: partial(_default_formatter, precision=precision))
+        self._display_funcs_index: DefaultDict[  # maps (row, level) -> format func
+            tuple[int, int], Callable[[Any], str]
+        ] = defaultdict(lambda: partial(_default_formatter, precision=precision))
+        self._display_funcs_columns: DefaultDict[  # maps (level, col) -> format func
+            tuple[int, int], Callable[[Any], str]
+        ] = defaultdict(lambda: partial(_default_formatter, precision=precision))
 
     def _render_html(
         self,
@@ -377,6 +383,7 @@ def _translate_header(
                             f"{col_heading_class} level{r} col{c}",
                             value,
                             _is_visible(c, r, col_lengths),
+                            display_value=self._display_funcs_columns[(r, c)](value),
                             attributes=(
                                 f'colspan="{col_lengths.get((r, c), 0)}"'
                                 if col_lengths.get((r, c), 0) > 1
@@ -535,6 +542,7 @@ def _translate_body(
                     f"{row_heading_class} level{c} row{r}",
                     value,
                     _is_visible(r, c, idx_lengths) and not self.hide_index_[c],
+                    display_value=self._display_funcs_index[(r, c)](value),
                     attributes=(
                         f'rowspan="{idx_lengths.get((c, r), 0)}"'
                         if idx_lengths.get((c, r), 0) > 1
@@ -834,6 +842,175 @@ def format(
 
         return self
 
+    def format_index(
+        self,
+        formatter: ExtFormatter | None = None,
+        axis: int | str = 0,
+        level: Level | list[Level] | None = None,
+        na_rep: str | None = None,
+        precision: int | None = None,
+        decimal: str = ".",
+        thousands: str | None = None,
+        escape: str | None = None,
+    ) -> StylerRenderer:
+        r"""
+        Format the text display value of index labels or column headers.
+
+        .. versionadded:: 1.4.0
+
+        Parameters
+        ----------
+        formatter : str, callable, dict or None
+            Object to define how values are displayed. See notes.
+        axis : {0, "index", 1, "columns"}
+            Whether to apply the formatter to the index or column headers.
+        level : int, str, list
+            The level(s) over which to apply the generic formatter.
+        na_rep : str, optional
+            Representation for missing values.
+            If ``na_rep`` is None, no special formatting is applied.
+        precision : int, optional
+            Floating point precision to use for display purposes, if not determined by
+            the specified ``formatter``.
+        decimal : str, default "."
+            Character used as decimal separator for floats, complex and integers
+        thousands : str, optional, default None
+            Character used as thousands separator for floats, complex and integers
+        escape : str, optional
+            Use 'html' to replace the characters ``&``, ``<``, ``>``, ``'``, and ``"``
+            in cell display string with HTML-safe sequences.
+            Use 'latex' to replace the characters ``&``, ``%``, ``$``, ``#``, ``_``,
+            ``{``, ``}``, ``~``, ``^``, and ``\`` in the cell display string with
+            LaTeX-safe sequences.
+            Escaping is done before ``formatter``.
+
+        Returns
+        -------
+        self : Styler
+
+        Notes
+        -----
+        This method assigns a formatting function, ``formatter``, to each level label
+        in the DataFrame's index or column headers. If ``formatter`` is ``None``,
+        then the default formatter is used.
+        If a callable then that function should take a label value as input and return
+        a displayable representation, such as a string. If ``formatter`` is
+        given as a string this is assumed to be a valid Python format specification
+        and is wrapped to a callable as ``string.format(x)``. If a ``dict`` is given,
+        keys should correspond to MultiIndex level numbers or names, and values should
+        be string or callable, as above.
+
+        The default formatter currently expresses floats and complex numbers with the
+        pandas display precision unless using the ``precision`` argument here. The
+        default formatter does not adjust the representation of missing values unless
+        the ``na_rep`` argument is used.
+
+        The ``level`` argument defines which levels of a MultiIndex to apply the
+        method to. If the ``formatter`` argument is given in dict form but does
+        not include all levels within the level argument then these unspecified levels
+        will have the default formatter applied. Any levels in the formatter dict
+        specifically excluded from the level argument will be ignored.
+
+        When using a ``formatter`` string the dtypes must be compatible, otherwise a
+        `ValueError` will be raised.
+
+        Examples
+        --------
+        Using ``na_rep`` and ``precision`` with the default ``formatter``
+
+        >>> df = pd.DataFrame([[1, 2, 3]], columns=[2.0, np.nan, 4.0]])
+        >>> df.style.format_index(axis=1, na_rep='MISS', precision=3)  # doctest: +SKIP
+            2.000    MISS   4.000
+        0       1       2       3
+
+        Using a ``formatter`` specification on consistent dtypes in a level
+
+        >>> df.style.format_index('{:.2f}', axis=1, na_rep='MISS')  # doctest: +SKIP
+             2.00   MISS    4.00
+        0       1      2       3
+
+        Using the default ``formatter`` for unspecified levels
+
+        >>> df = pd.DataFrame([[1, 2, 3]],
+        ...     columns=pd.MultiIndex.from_arrays([["a", "a", "b"],[2, np.nan, 4]]))
+        >>> df.style.format_index({0: lambda v: upper(v)}, axis=1, precision=1)
+        ...  # doctest: +SKIP
+                       A       B
+              2.0    nan     4.0
+        0       1      2       3
+
+        Using a callable ``formatter`` function.
+
+        >>> func = lambda s: 'STRING' if isinstance(s, str) else 'FLOAT'
+        >>> df.style.format_index(func, axis=1, na_rep='MISS')
+        ...  # doctest: +SKIP
+                  STRING  STRING
+            FLOAT   MISS   FLOAT
+        0       1      2       3
+
+        Using a ``formatter`` with HTML ``escape`` and ``na_rep``.
+
+        >>> df = pd.DataFrame([[1, 2, 3]], columns=['"A"', 'A&B', None])
+        >>> s = df.style.format_index('$ {0}', axis=1, escape="html", na_rep="NA")
+        <th .. >$ &#34;A&#34;</th>
+        <th .. >$ A&amp;B</th>
+        <th .. >NA</td>
+        ...
+
+        Using a ``formatter`` with LaTeX ``escape``.
+
+        >>> df = pd.DataFrame([[1, 2, 3]], columns=["123", "~", "$%#"])
+        >>> df.style.format_index("\\textbf{{{}}}", escape="latex", axis=1).to_latex()
+        ...  # doctest: +SKIP
+        \begin{tabular}{lrrr}
+        {} & {\textbf{123}} & {\textbf{\textasciitilde }} & {\textbf{\$\%\#}} \\
+        0 & 1 & 2 & 3 \\
+        \end{tabular}
+        """
+        axis = self.data._get_axis_number(axis)
+        if axis == 0:
+            display_funcs_, obj = self._display_funcs_index, self.index
+        else:
+            display_funcs_, obj = self._display_funcs_columns, self.columns
+        levels_ = refactor_levels(level, obj)
+
+        if all(
+            (
+                formatter is None,
+                level is None,
+                precision is None,
+                decimal == ".",
+                thousands is None,
+                na_rep is None,
+                escape is None,
+            )
+        ):
+            display_funcs_.clear()
+            return self  # clear the formatter / revert to default and avoid looping
+
+        if not isinstance(formatter, dict):
+            formatter = {level: formatter for level in levels_}
+        else:
+            formatter = {
+                obj._get_level_number(level): formatter_
+                for level, formatter_ in formatter.items()
+            }
+
+        for lvl in levels_:
+            format_func = _maybe_wrap_formatter(
+                formatter.get(lvl),
+                na_rep=na_rep,
+                precision=precision,
+                decimal=decimal,
+                thousands=thousands,
+                escape=escape,
+            )
+
+            for idx in [(i, lvl) if axis == 0 else (lvl, i) for i in range(len(obj))]:
+                display_funcs_[idx] = format_func
+
+        return self
+
 
 def _element(
     html_element: str,
diff --git a/pandas/io/formats/templates/html_table.tpl b/pandas/io/formats/templates/html_table.tpl
@@ -21,13 +21,13 @@
 {% if exclude_styles %}
 {% for c in r %}
 {% if c.is_visible != False %}
-      <{{c.type}} {{c.attributes}}>{{c.value}}</{{c.type}}>
+      <{{c.type}} {{c.attributes}}>{{c.display_value}}</{{c.type}}>
 {% endif %}
 {% endfor %}
 {% else %}
 {% for c in r %}
 {% if c.is_visible != False %}
-      <{{c.type}} {%- if c.id is defined %} id="T_{{uuid}}_{{c.id}}" {%- endif %} class="{{c.class}}" {{c.attributes}}>{{c.value}}</{{c.type}}>
+      <{{c.type}} {%- if c.id is defined %} id="T_{{uuid}}_{{c.id}}" {%- endif %} class="{{c.class}}" {{c.attributes}}>{{c.display_value}}</{{c.type}}>
 {% endif %}
 {% endfor %}
 {% endif %}
diff --git a/pandas/tests/io/formats/style/test_format.py b/pandas/tests/io/formats/style/test_format.py
diff --git a/pandas/tests/io/formats/style/test_style.py b/pandas/tests/io/formats/style/test_style.py

Original file line number	Diff line number	Diff line change
`@@ -1184,6 +1184,8 @@ def _copy(self, deepcopy: bool = False) -> Styler:`
`1184`	`1184`	`]`
`1185`	`1185`	`deep = [ # nested lists or dicts`
`1186`	`1186`	`"_display_funcs",`
	`1187`	`+ "_display_funcs_index",`
	`1188`	`+ "_display_funcs_columns",`
`1187`	`1189`	`"hidden_rows",`
`1188`	`1190`	`"hidden_columns",`
`1189`	`1191`	`"ctx",`