Skip to content

Commit 5d163ce

Browse files
committed
ENH: DataFrame.style sparsified MultiIndex
- [x] closes #11655 - [x] tests added / passed - [x] passes ``git diff upstream/master | flake8 --diff`` - [x] whatsnew entry [Notebook comparing `DataFrame._html_repr_` to `DataFrame.style`](http s://gist.github.com/609c398f814b4a505bf4f406670e457e) I think we're identical for non-truncated DataFrames. That' has not been implemented in `Styler` yet. Along the way I noticed two other things that ended up needing fixing. 1. DataFrame.columns.names were not displayed 2. CSS classes weren't being assigned correctly to row labels. The fixes ended up being pretty intertwined, so I've put them in a single PR. Unfortunately, the commits are a bit jumbled as well :/ Author: Tom Augspurger <[email protected]> Closes #13775 from TomAugspurger/style-sparse-mi-2 and squashes the following commits: 7c03a72 [Tom Augspurger] ENH: DataFrame.style column names ecba615 [Tom Augspurger] ENH: MultiIndex Structure for DataFrame.style
1 parent 9ee8c0d commit 5d163ce

File tree

4 files changed

+301
-38
lines changed

4 files changed

+301
-38
lines changed

doc/source/html-styling.ipynb

+21
Original file line numberDiff line numberDiff line change
@@ -788,6 +788,27 @@
788788
"We hope to collect some useful ones either in pandas, or preferable in a new package that [builds on top](#Extensibility) the tools here."
789789
]
790790
},
791+
{
792+
"cell_type": "markdown",
793+
"metadata": {},
794+
"source": [
795+
"# CSS Classes\n",
796+
"\n",
797+
"Certain CSS classes are attached to cells.\n",
798+
"\n",
799+
"- Index and Column names include `index_name` and `level<k>` where `k` is its level in a MultiIndex\n",
800+
"- Index label cells include\n",
801+
" + `row_heading`\n",
802+
" + `row<n>` where `n` is the numeric position of the row\n",
803+
" + `level<k>` where `k` is the level in a MultiIndex\n",
804+
"- Column label cells include\n",
805+
" + `col_heading`\n",
806+
" + `col<n>` where `n` is the numeric position of the column\n",
807+
" + `level<k>` where `k` is the level in a MultiIndex\n",
808+
"- Blank cells include `blank`\n",
809+
"- Data cells include `data`"
810+
]
811+
},
791812
{
792813
"cell_type": "markdown",
793814
"metadata": {},

doc/source/whatsnew/v0.19.0.txt

+3-1
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,8 @@ Other enhancements
373373
- ``Series.append`` now supports the ``ignore_index`` option (:issue:`13677`)
374374
- ``.to_stata()`` and ``StataWriter`` can now write variable labels to Stata dta files using a dictionary to make column names to labels (:issue:`13535`, :issue:`13536`)
375375
- ``.to_stata()`` and ``StataWriter`` will automatically convert ``datetime64[ns]`` columns to Stata format ``%tc``, rather than raising a ``ValueError`` (:issue:`12259`)
376+
- ``DataFrame.style`` will now render sparsified MultiIndexes (:issue:`11655`)
377+
- ``DataFrame.style`` will now show column level names (e.g. ``DataFrame.columns.names``) (:issue:`13775`)
376378
- ``DataFrame`` has gained support to re-order the columns based on the values
377379
in a row using ``df.sort_values(by='...', axis=1)`` (:issue:`10806`)
378380

@@ -884,10 +886,10 @@ Bug Fixes
884886
- Bug in ``groupby`` with ``as_index=False`` returns all NaN's when grouping on multiple columns including a categorical one (:issue:`13204`)
885887
- Bug in ``df.groupby(...)[...]`` where getitem with ``Int64Index`` raised an error (:issue:`13731`)
886888

889+
- Bug in the CSS classes assigned to ``DataFrame.style`` for index names. Previously they were assigned ``"col_heading level<n> col<c>"`` where ``n`` was the number of levels + 1. Now they are assigned ``"index_name level<n>"``, where ``n`` is the correct level for that MultiIndex.
887890
- Bug where ``pd.read_gbq()`` could throw ``ImportError: No module named discovery`` as a result of a naming conflict with another python package called apiclient (:issue:`13454`)
888891
- Bug in ``Index.union`` returns an incorrect result with a named empty index (:issue:`13432`)
889892
- Bugs in ``Index.difference`` and ``DataFrame.join`` raise in Python3 when using mixed-integer indexes (:issue:`13432`, :issue:`12814`)
890-
891893
- Bug in ``.to_excel()`` when DataFrame contains a MultiIndex which contains a label with a NaN value (:issue:`13511`)
892894
- Bug in ``pd.read_csv`` in Python 2.x with non-UTF8 encoded, multi-character separated data (:issue:`3404`)
893895
- Bug in ``Index`` raises ``KeyError`` displaying incorrect column when column is not in the df and columns contains duplicate values (:issue:`13822`)

pandas/formats/style.py

+110-18
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,9 @@
2121

2222
import numpy as np
2323
import pandas as pd
24-
from pandas.compat import lzip, range
24+
from pandas.compat import range
25+
from pandas.core.config import get_option
26+
import pandas.core.common as com
2527
from pandas.core.indexing import _maybe_numeric_slice, _non_reducing_slice
2628
try:
2729
import matplotlib.pyplot as plt
@@ -79,6 +81,24 @@ class Styler(object):
7981
to automatically render itself. Otherwise call Styler.render to get
8082
the genterated HTML.
8183
84+
CSS classes are attached to the generated HTML
85+
86+
* Index and Column names include ``index_name`` and ``level<k>``
87+
where `k` is its level in a MultiIndex
88+
* Index label cells include
89+
90+
* ``row_heading``
91+
* ``row<n>`` where `n` is the numeric position of the row
92+
* ``level<k>`` where `k` is the level in a MultiIndex
93+
94+
* Column label cells include
95+
* ``col_heading``
96+
* ``col<n>`` where `n` is the numeric position of the column
97+
* ``evel<k>`` where `k` is the level in a MultiIndex
98+
99+
* Blank cells include ``blank``
100+
* Data cells include ``data``
101+
82102
See Also
83103
--------
84104
pandas.DataFrame.style
@@ -110,7 +130,10 @@ class Styler(object):
110130
{% for r in head %}
111131
<tr>
112132
{% for c in r %}
113-
<{{c.type}} class="{{c.class}}">{{c.value}}
133+
{% if c.is_visible != False %}
134+
<{{c.type}} class="{{c.class}}" {{ c.attributes|join(" ") }}>
135+
{{c.value}}
136+
{% endif %}
114137
{% endfor %}
115138
</tr>
116139
{% endfor %}
@@ -119,8 +142,11 @@ class Styler(object):
119142
{% for r in body %}
120143
<tr>
121144
{% for c in r %}
122-
<{{c.type}} id="T_{{uuid}}{{c.id}}" class="{{c.class}}">
145+
{% if c.is_visible != False %}
146+
<{{c.type}} id="T_{{uuid}}{{c.id}}"
147+
class="{{c.class}}" {{ c.attributes|join(" ") }}>
123148
{{ c.display_value }}
149+
{% endif %}
124150
{% endfor %}
125151
</tr>
126152
{% endfor %}
@@ -148,7 +174,7 @@ def __init__(self, data, precision=None, table_styles=None, uuid=None,
148174
self.table_styles = table_styles
149175
self.caption = caption
150176
if precision is None:
151-
precision = pd.options.display.precision
177+
precision = get_option('display.precision')
152178
self.precision = precision
153179
self.table_attributes = table_attributes
154180
# display_funcs maps (row, col) -> formatting function
@@ -177,21 +203,26 @@ def _translate(self):
177203
uuid = self.uuid or str(uuid1()).replace("-", "_")
178204
ROW_HEADING_CLASS = "row_heading"
179205
COL_HEADING_CLASS = "col_heading"
206+
INDEX_NAME_CLASS = "index_name"
207+
180208
DATA_CLASS = "data"
181209
BLANK_CLASS = "blank"
182210
BLANK_VALUE = ""
183211

212+
def format_attr(pair):
213+
return "{key}={value}".format(**pair)
214+
215+
# for sparsifying a MultiIndex
216+
idx_lengths = _get_level_lengths(self.index)
217+
col_lengths = _get_level_lengths(self.columns)
218+
184219
cell_context = dict()
185220

186221
n_rlvls = self.data.index.nlevels
187222
n_clvls = self.data.columns.nlevels
188223
rlabels = self.data.index.tolist()
189224
clabels = self.data.columns.tolist()
190225

191-
idx_values = self.data.index.format(sparsify=False, adjoin=False,
192-
names=False)
193-
idx_values = lzip(*idx_values)
194-
195226
if n_rlvls == 1:
196227
rlabels = [[x] for x in rlabels]
197228
if n_clvls == 1:
@@ -202,9 +233,24 @@ def _translate(self):
202233
head = []
203234

204235
for r in range(n_clvls):
236+
# Blank for Index columns...
205237
row_es = [{"type": "th",
206238
"value": BLANK_VALUE,
207-
"class": " ".join([BLANK_CLASS])}] * n_rlvls
239+
"display_value": BLANK_VALUE,
240+
"is_visible": True,
241+
"class": " ".join([BLANK_CLASS])}] * (n_rlvls - 1)
242+
243+
# ... except maybe the last for columns.names
244+
name = self.data.columns.names[r]
245+
cs = [BLANK_CLASS if name is None else INDEX_NAME_CLASS,
246+
"level%s" % r]
247+
name = BLANK_VALUE if name is None else name
248+
row_es.append({"type": "th",
249+
"value": name,
250+
"display_value": name,
251+
"class": " ".join(cs),
252+
"is_visible": True})
253+
208254
for c in range(len(clabels[0])):
209255
cs = [COL_HEADING_CLASS, "level%s" % r, "col%s" % c]
210256
cs.extend(cell_context.get(
@@ -213,16 +259,23 @@ def _translate(self):
213259
row_es.append({"type": "th",
214260
"value": value,
215261
"display_value": value,
216-
"class": " ".join(cs)})
262+
"class": " ".join(cs),
263+
"is_visible": _is_visible(c, r, col_lengths),
264+
"attributes": [
265+
format_attr({"key": "colspan",
266+
"value": col_lengths.get(
267+
(r, c), 1)})
268+
]})
217269
head.append(row_es)
218270

219-
if self.data.index.names and self.data.index.names != [None]:
271+
if self.data.index.names and not all(x is None
272+
for x in self.data.index.names):
220273
index_header_row = []
221274

222275
for c, name in enumerate(self.data.index.names):
223-
cs = [COL_HEADING_CLASS,
224-
"level%s" % (n_clvls + 1),
225-
"col%s" % c]
276+
cs = [INDEX_NAME_CLASS,
277+
"level%s" % c]
278+
name = '' if name is None else name
226279
index_header_row.append({"type": "th", "value": name,
227280
"class": " ".join(cs)})
228281

@@ -236,12 +289,17 @@ def _translate(self):
236289

237290
body = []
238291
for r, idx in enumerate(self.data.index):
239-
cs = [ROW_HEADING_CLASS, "level%s" % c, "row%s" % r]
240-
cs.extend(
241-
cell_context.get("row_headings", {}).get(r, {}).get(c, []))
292+
# cs.extend(
293+
# cell_context.get("row_headings", {}).get(r, {}).get(c, []))
242294
row_es = [{"type": "th",
295+
"is_visible": _is_visible(r, c, idx_lengths),
296+
"attributes": [
297+
format_attr({"key": "rowspan",
298+
"value": idx_lengths.get((c, r), 1)})
299+
],
243300
"value": rlabels[r][c],
244-
"class": " ".join(cs),
301+
"class": " ".join([ROW_HEADING_CLASS, "level%s" % c,
302+
"row%s" % r]),
245303
"display_value": rlabels[r][c]}
246304
for c in range(len(rlabels[r]))]
247305

@@ -893,6 +951,40 @@ def _highlight_extrema(data, color='yellow', max_=True):
893951
index=data.index, columns=data.columns)
894952

895953

954+
def _is_visible(idx_row, idx_col, lengths):
955+
"""
956+
Index -> {(idx_row, idx_col): bool})
957+
"""
958+
return (idx_col, idx_row) in lengths
959+
960+
961+
def _get_level_lengths(index):
962+
"""
963+
Given an index, find the level lenght for each element.
964+
965+
Result is a dictionary of (level, inital_position): span
966+
"""
967+
sentinel = com.sentinel_factory()
968+
levels = index.format(sparsify=sentinel, adjoin=False, names=False)
969+
970+
if index.nlevels == 1:
971+
return {(0, i): 1 for i, value in enumerate(levels)}
972+
973+
lengths = {}
974+
975+
for i, lvl in enumerate(levels):
976+
for j, row in enumerate(lvl):
977+
if not get_option('display.multi_sparse'):
978+
lengths[(i, j)] = 1
979+
elif row != sentinel:
980+
last_label = j
981+
lengths[(i, last_label)] = 1
982+
else:
983+
lengths[(i, last_label)] += 1
984+
985+
return lengths
986+
987+
896988
def _maybe_wrap_formatter(formatter):
897989
if is_string_like(formatter):
898990
return lambda x: formatter.format(x)

0 commit comments

Comments
 (0)