Skip to content

Commit 4081fc7

Browse files
authored
ENH: Styler.format_index() to display index values similarly to data-values with format() (#43101)
Co-authored-by: JHM Darbyshire (iMac) <[email protected]>
1 parent 7036de3 commit 4081fc7

File tree

8 files changed

+378
-60
lines changed

8 files changed

+378
-60
lines changed

doc/source/reference/style.rst

+1
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ Style application
3939
Styler.apply_index
4040
Styler.applymap_index
4141
Styler.format
42+
Styler.format_index
4243
Styler.hide_index
4344
Styler.hide_columns
4445
Styler.set_td_classes

doc/source/user_guide/style.ipynb

+48-7
Original file line numberDiff line numberDiff line change
@@ -150,15 +150,14 @@
150150
"\n",
151151
"### Formatting Values\n",
152152
"\n",
153-
"Before adding styles it is useful to show that the [Styler][styler] can distinguish the *display* value from the *actual* value. To control the display value, the text is printed in each cell, and we can use the [.format()][formatfunc] method to manipulate this according to a [format spec string][format] or a callable that takes a single value and returns a string. It is possible to define this for the whole table or for individual columns. \n",
153+
"Before adding styles it is useful to show that the [Styler][styler] can distinguish the *display* value from the *actual* value, in both datavlaues and index or columns headers. To control the display value, the text is printed in each cell as string, and we can use the [.format()][formatfunc] and [.format_index()][formatfuncindex] methods to manipulate this according to a [format spec string][format] or a callable that takes a single value and returns a string. It is possible to define this for the whole table, or index, or for individual columns, or MultiIndex levels. \n",
154154
"\n",
155-
"Additionally, the format function has a **precision** argument to specifically help formatting floats, as well as **decimal** and **thousands** separators to support other locales, an **na_rep** argument to display missing data, and an **escape** argument to help displaying safe-HTML or safe-LaTeX. The default formatter is configured to adopt pandas' regular `display.precision` option, controllable using `with pd.option_context('display.precision', 2):`\n",
156-
"\n",
157-
"Here is an example of using the multiple options to control the formatting generally and with specific column formatters.\n",
155+
"Additionally, the format function has a **precision** argument to specifically help formatting floats, as well as **decimal** and **thousands** separators to support other locales, an **na_rep** argument to display missing data, and an **escape** argument to help displaying safe-HTML or safe-LaTeX. The default formatter is configured to adopt pandas' regular `display.precision` option, controllable using `with pd.option_context('display.precision', 2):` \n",
158156
"\n",
159157
"[styler]: ../reference/api/pandas.io.formats.style.Styler.rst\n",
160158
"[format]: https://docs.python.org/3/library/string.html#format-specification-mini-language\n",
161-
"[formatfunc]: ../reference/api/pandas.io.formats.style.Styler.format.rst"
159+
"[formatfunc]: ../reference/api/pandas.io.formats.style.Styler.format.rst\n",
160+
"[formatfuncindex]: ../reference/api/pandas.io.formats.style.Styler.format_index.rst"
162161
]
163162
},
164163
{
@@ -173,6 +172,49 @@
173172
" })"
174173
]
175174
},
175+
{
176+
"cell_type": "markdown",
177+
"metadata": {},
178+
"source": [
179+
"Using Styler to manipulate the display is a useful feature because maintaining the indexing and datavalues for other purposes gives greater control. You do not have to overwrite your DataFrame to display it how you like. Here is an example of using the formatting functions whilst still relying on the underlying data for indexing and calculations."
180+
]
181+
},
182+
{
183+
"cell_type": "code",
184+
"execution_count": null,
185+
"metadata": {},
186+
"outputs": [],
187+
"source": [
188+
"weather_df = pd.DataFrame(np.random.rand(10,2)*5, \n",
189+
" index=pd.date_range(start=\"2021-01-01\", periods=10),\n",
190+
" columns=[\"Tokyo\", \"Beijing\"])\n",
191+
"\n",
192+
"def rain_condition(v): \n",
193+
" if v < 1.75:\n",
194+
" return \"Dry\"\n",
195+
" elif v < 2.75:\n",
196+
" return \"Rain\"\n",
197+
" return \"Heavy Rain\"\n",
198+
"\n",
199+
"def make_pretty(styler):\n",
200+
" styler.set_caption(\"Weather Conditions\")\n",
201+
" styler.format(rain_condition)\n",
202+
" styler.format_index(lambda v: v.strftime(\"%A\"))\n",
203+
" styler.background_gradient(axis=None, vmin=1, vmax=5, cmap=\"YlGnBu\")\n",
204+
" return styler\n",
205+
"\n",
206+
"weather_df"
207+
]
208+
},
209+
{
210+
"cell_type": "code",
211+
"execution_count": null,
212+
"metadata": {},
213+
"outputs": [],
214+
"source": [
215+
"weather_df.loc[\"2021-01-04\":\"2021-01-08\"].style.pipe(make_pretty)"
216+
]
217+
},
176218
{
177219
"cell_type": "markdown",
178220
"metadata": {},
@@ -187,7 +229,7 @@
187229
"\n",
188230
"Hiding does not change the integer arrangement of CSS classes, e.g. hiding the first two columns of a DataFrame means the column class indexing will start at `col2`, since `col0` and `col1` are simply ignored.\n",
189231
"\n",
190-
"We can update our `Styler` object to hide some data and format the values.\n",
232+
"We can update our `Styler` object from before to hide some data and format the values.\n",
191233
"\n",
192234
"[hideidx]: ../reference/api/pandas.io.formats.style.Styler.hide_index.rst\n",
193235
"[hidecols]: ../reference/api/pandas.io.formats.style.Styler.hide_columns.rst"
@@ -1974,7 +2016,6 @@
19742016
}
19752017
],
19762018
"metadata": {
1977-
"celltoolbar": "Edit Metadata",
19782019
"kernelspec": {
19792020
"display_name": "Python 3",
19802021
"language": "python",

doc/source/whatsnew/v1.4.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ Styler
7272

7373
:class:`.Styler` has been further developed in 1.4.0. The following enhancements have been made:
7474

75-
- Styling of indexing has been added, with :meth:`.Styler.apply_index` and :meth:`.Styler.applymap_index`. These mirror the signature of the methods already used to style data values, and work with both HTML and LaTeX format (:issue:`41893`).
75+
- Styling and formatting of indexes has been added, with :meth:`.Styler.apply_index`, :meth:`.Styler.applymap_index` and :meth:`.Styler.format_index`. These mirror the signature of the methods already used to style and format data values, and work with both HTML and LaTeX format (:issue:`41893`, :issue:`43101`).
7676
- :meth:`.Styler.bar` introduces additional arguments to control alignment and display (:issue:`26070`, :issue:`36419`), and it also validates the input arguments ``width`` and ``height`` (:issue:`42511`).
7777
- :meth:`.Styler.to_latex` introduces keyword argument ``environment``, which also allows a specific "longtable" entry through a separate jinja2 template (:issue:`41866`).
7878
- :meth:`.Styler.to_html` introduces keyword arguments ``sparse_index``, ``sparse_columns``, ``bold_headers``, ``caption``, ``max_rows`` and ``max_columns`` (:issue:`41946`, :issue:`43149`, :issue:`42972`).

pandas/io/formats/style.py

+2
Original file line numberDiff line numberDiff line change
@@ -1184,6 +1184,8 @@ def _copy(self, deepcopy: bool = False) -> Styler:
11841184
]
11851185
deep = [ # nested lists or dicts
11861186
"_display_funcs",
1187+
"_display_funcs_index",
1188+
"_display_funcs_columns",
11871189
"hidden_rows",
11881190
"hidden_columns",
11891191
"ctx",

pandas/io/formats/style_render.py

+177
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,12 @@ def __init__(
117117
self._display_funcs: DefaultDict[ # maps (row, col) -> format func
118118
tuple[int, int], Callable[[Any], str]
119119
] = defaultdict(lambda: partial(_default_formatter, precision=precision))
120+
self._display_funcs_index: DefaultDict[ # maps (row, level) -> format func
121+
tuple[int, int], Callable[[Any], str]
122+
] = defaultdict(lambda: partial(_default_formatter, precision=precision))
123+
self._display_funcs_columns: DefaultDict[ # maps (level, col) -> format func
124+
tuple[int, int], Callable[[Any], str]
125+
] = defaultdict(lambda: partial(_default_formatter, precision=precision))
120126

121127
def _render_html(
122128
self,
@@ -377,6 +383,7 @@ def _translate_header(
377383
f"{col_heading_class} level{r} col{c}",
378384
value,
379385
_is_visible(c, r, col_lengths),
386+
display_value=self._display_funcs_columns[(r, c)](value),
380387
attributes=(
381388
f'colspan="{col_lengths.get((r, c), 0)}"'
382389
if col_lengths.get((r, c), 0) > 1
@@ -535,6 +542,7 @@ def _translate_body(
535542
f"{row_heading_class} level{c} row{r}",
536543
value,
537544
_is_visible(r, c, idx_lengths) and not self.hide_index_[c],
545+
display_value=self._display_funcs_index[(r, c)](value),
538546
attributes=(
539547
f'rowspan="{idx_lengths.get((c, r), 0)}"'
540548
if idx_lengths.get((c, r), 0) > 1
@@ -834,6 +842,175 @@ def format(
834842

835843
return self
836844

845+
def format_index(
846+
self,
847+
formatter: ExtFormatter | None = None,
848+
axis: int | str = 0,
849+
level: Level | list[Level] | None = None,
850+
na_rep: str | None = None,
851+
precision: int | None = None,
852+
decimal: str = ".",
853+
thousands: str | None = None,
854+
escape: str | None = None,
855+
) -> StylerRenderer:
856+
r"""
857+
Format the text display value of index labels or column headers.
858+
859+
.. versionadded:: 1.4.0
860+
861+
Parameters
862+
----------
863+
formatter : str, callable, dict or None
864+
Object to define how values are displayed. See notes.
865+
axis : {0, "index", 1, "columns"}
866+
Whether to apply the formatter to the index or column headers.
867+
level : int, str, list
868+
The level(s) over which to apply the generic formatter.
869+
na_rep : str, optional
870+
Representation for missing values.
871+
If ``na_rep`` is None, no special formatting is applied.
872+
precision : int, optional
873+
Floating point precision to use for display purposes, if not determined by
874+
the specified ``formatter``.
875+
decimal : str, default "."
876+
Character used as decimal separator for floats, complex and integers
877+
thousands : str, optional, default None
878+
Character used as thousands separator for floats, complex and integers
879+
escape : str, optional
880+
Use 'html' to replace the characters ``&``, ``<``, ``>``, ``'``, and ``"``
881+
in cell display string with HTML-safe sequences.
882+
Use 'latex' to replace the characters ``&``, ``%``, ``$``, ``#``, ``_``,
883+
``{``, ``}``, ``~``, ``^``, and ``\`` in the cell display string with
884+
LaTeX-safe sequences.
885+
Escaping is done before ``formatter``.
886+
887+
Returns
888+
-------
889+
self : Styler
890+
891+
Notes
892+
-----
893+
This method assigns a formatting function, ``formatter``, to each level label
894+
in the DataFrame's index or column headers. If ``formatter`` is ``None``,
895+
then the default formatter is used.
896+
If a callable then that function should take a label value as input and return
897+
a displayable representation, such as a string. If ``formatter`` is
898+
given as a string this is assumed to be a valid Python format specification
899+
and is wrapped to a callable as ``string.format(x)``. If a ``dict`` is given,
900+
keys should correspond to MultiIndex level numbers or names, and values should
901+
be string or callable, as above.
902+
903+
The default formatter currently expresses floats and complex numbers with the
904+
pandas display precision unless using the ``precision`` argument here. The
905+
default formatter does not adjust the representation of missing values unless
906+
the ``na_rep`` argument is used.
907+
908+
The ``level`` argument defines which levels of a MultiIndex to apply the
909+
method to. If the ``formatter`` argument is given in dict form but does
910+
not include all levels within the level argument then these unspecified levels
911+
will have the default formatter applied. Any levels in the formatter dict
912+
specifically excluded from the level argument will be ignored.
913+
914+
When using a ``formatter`` string the dtypes must be compatible, otherwise a
915+
`ValueError` will be raised.
916+
917+
Examples
918+
--------
919+
Using ``na_rep`` and ``precision`` with the default ``formatter``
920+
921+
>>> df = pd.DataFrame([[1, 2, 3]], columns=[2.0, np.nan, 4.0]])
922+
>>> df.style.format_index(axis=1, na_rep='MISS', precision=3) # doctest: +SKIP
923+
2.000 MISS 4.000
924+
0 1 2 3
925+
926+
Using a ``formatter`` specification on consistent dtypes in a level
927+
928+
>>> df.style.format_index('{:.2f}', axis=1, na_rep='MISS') # doctest: +SKIP
929+
2.00 MISS 4.00
930+
0 1 2 3
931+
932+
Using the default ``formatter`` for unspecified levels
933+
934+
>>> df = pd.DataFrame([[1, 2, 3]],
935+
... columns=pd.MultiIndex.from_arrays([["a", "a", "b"],[2, np.nan, 4]]))
936+
>>> df.style.format_index({0: lambda v: upper(v)}, axis=1, precision=1)
937+
... # doctest: +SKIP
938+
A B
939+
2.0 nan 4.0
940+
0 1 2 3
941+
942+
Using a callable ``formatter`` function.
943+
944+
>>> func = lambda s: 'STRING' if isinstance(s, str) else 'FLOAT'
945+
>>> df.style.format_index(func, axis=1, na_rep='MISS')
946+
... # doctest: +SKIP
947+
STRING STRING
948+
FLOAT MISS FLOAT
949+
0 1 2 3
950+
951+
Using a ``formatter`` with HTML ``escape`` and ``na_rep``.
952+
953+
>>> df = pd.DataFrame([[1, 2, 3]], columns=['"A"', 'A&B', None])
954+
>>> s = df.style.format_index('$ {0}', axis=1, escape="html", na_rep="NA")
955+
<th .. >$ &#34;A&#34;</th>
956+
<th .. >$ A&amp;B</th>
957+
<th .. >NA</td>
958+
...
959+
960+
Using a ``formatter`` with LaTeX ``escape``.
961+
962+
>>> df = pd.DataFrame([[1, 2, 3]], columns=["123", "~", "$%#"])
963+
>>> df.style.format_index("\\textbf{{{}}}", escape="latex", axis=1).to_latex()
964+
... # doctest: +SKIP
965+
\begin{tabular}{lrrr}
966+
{} & {\textbf{123}} & {\textbf{\textasciitilde }} & {\textbf{\$\%\#}} \\
967+
0 & 1 & 2 & 3 \\
968+
\end{tabular}
969+
"""
970+
axis = self.data._get_axis_number(axis)
971+
if axis == 0:
972+
display_funcs_, obj = self._display_funcs_index, self.index
973+
else:
974+
display_funcs_, obj = self._display_funcs_columns, self.columns
975+
levels_ = refactor_levels(level, obj)
976+
977+
if all(
978+
(
979+
formatter is None,
980+
level is None,
981+
precision is None,
982+
decimal == ".",
983+
thousands is None,
984+
na_rep is None,
985+
escape is None,
986+
)
987+
):
988+
display_funcs_.clear()
989+
return self # clear the formatter / revert to default and avoid looping
990+
991+
if not isinstance(formatter, dict):
992+
formatter = {level: formatter for level in levels_}
993+
else:
994+
formatter = {
995+
obj._get_level_number(level): formatter_
996+
for level, formatter_ in formatter.items()
997+
}
998+
999+
for lvl in levels_:
1000+
format_func = _maybe_wrap_formatter(
1001+
formatter.get(lvl),
1002+
na_rep=na_rep,
1003+
precision=precision,
1004+
decimal=decimal,
1005+
thousands=thousands,
1006+
escape=escape,
1007+
)
1008+
1009+
for idx in [(i, lvl) if axis == 0 else (lvl, i) for i in range(len(obj))]:
1010+
display_funcs_[idx] = format_func
1011+
1012+
return self
1013+
8371014

8381015
def _element(
8391016
html_element: str,

pandas/io/formats/templates/html_table.tpl

+2-2
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,13 @@
2121
{% if exclude_styles %}
2222
{% for c in r %}
2323
{% if c.is_visible != False %}
24-
<{{c.type}} {{c.attributes}}>{{c.value}}</{{c.type}}>
24+
<{{c.type}} {{c.attributes}}>{{c.display_value}}</{{c.type}}>
2525
{% endif %}
2626
{% endfor %}
2727
{% else %}
2828
{% for c in r %}
2929
{% if c.is_visible != False %}
30-
<{{c.type}} {%- if c.id is defined %} id="T_{{uuid}}_{{c.id}}" {%- endif %} class="{{c.class}}" {{c.attributes}}>{{c.value}}</{{c.type}}>
30+
<{{c.type}} {%- if c.id is defined %} id="T_{{uuid}}_{{c.id}}" {%- endif %} class="{{c.class}}" {{c.attributes}}>{{c.display_value}}</{{c.type}}>
3131
{% endif %}
3232
{% endfor %}
3333
{% endif %}

0 commit comments

Comments
 (0)