Skip to content

ENH: DataFrame.style sparsified MultiIndex #13775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions doc/source/html-styling.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -788,6 +788,27 @@
"We hope to collect some useful ones either in pandas, or preferable in a new package that [builds on top](#Extensibility) the tools here."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CSS Classes\n",
"\n",
"Certain CSS classes are attached to cells.\n",
"\n",
"- Index and Column names include `index_name` and `level<k>` where `k` is its level in a MultiIndex\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would add this documenation to the doc-string as well

"- Index label cells include\n",
" + `row_heading`\n",
" + `row<n>` where `n` is the numeric position of the row\n",
" + `level<k>` where `k` is the level in a MultiIndex\n",
"- Column label cells include\n",
" + `col_heading`\n",
" + `col<n>` where `n` is the numeric position of the column\n",
" + `level<k>` where `k` is the level in a MultiIndex\n",
"- Blank cells include `blank`\n",
"- Data cells include `data`"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
4 changes: 3 additions & 1 deletion doc/source/whatsnew/v0.19.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,8 @@ Other enhancements
- ``Series.append`` now supports the ``ignore_index`` option (:issue:`13677`)
- ``.to_stata()`` and ``StataWriter`` can now write variable labels to Stata dta files using a dictionary to make column names to labels (:issue:`13535`, :issue:`13536`)
- ``.to_stata()`` and ``StataWriter`` will automatically convert ``datetime64[ns]`` columns to Stata format ``%tc``, rather than raising a ``ValueError`` (:issue:`12259`)
- ``DataFrame.style`` will now render sparsified MultiIndexes (:issue:`11655`)
- ``DataFrame.style`` will now show column level names (e.g. ``DataFrame.columns.names``) (:issue:`13775`)
- ``DataFrame`` has gained support to re-order the columns based on the values
in a row using ``df.sort_values(by='...', axis=1)`` (:issue:`10806`)

Expand Down Expand Up @@ -769,8 +771,8 @@ Bug Fixes
- Bug in ``groupby`` with ``as_index=False`` returns all NaN's when grouping on multiple columns including a categorical one (:issue:`13204`)
- Bug in ``df.groupby(...)[...]`` where getitem with ``Int64Index`` raised an error (:issue:`13731`)

- Bug in the CSS classes assigned to ``DataFrame.style`` for index names. Previously they were assigned ``"col_heading level<n> col<c>"`` where ``n`` was the number of levels + 1. Now they are assigned ``"index_name level<n>"``, where ``n`` is the correct level for that MultiIndex.
- Bug where ``pd.read_gbq()`` could throw ``ImportError: No module named discovery`` as a result of a naming conflict with another python package called apiclient (:issue:`13454`)
- Bug in ``Index.union`` returns an incorrect result with a named empty index (:issue:`13432`)
- Bugs in ``Index.difference`` and ``DataFrame.join`` raise in Python3 when using mixed-integer indexes (:issue:`13432`, :issue:`12814`)

- Bug in ``.to_excel()`` when DataFrame contains a MultiIndex which contains a label with a NaN value (:issue:`13511`)
128 changes: 110 additions & 18 deletions pandas/formats/style.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@

import numpy as np
import pandas as pd
from pandas.compat import lzip, range
from pandas.compat import range
from pandas.core.config import get_option
import pandas.core.common as com
from pandas.core.indexing import _maybe_numeric_slice, _non_reducing_slice
try:
import matplotlib.pyplot as plt
Expand Down Expand Up @@ -79,6 +81,24 @@ class Styler(object):
to automatically render itself. Otherwise call Styler.render to get
the genterated HTML.

CSS classes are attached to the generated HTML

* Index and Column names include ``index_name`` and ``level<k>``
where `k` is its level in a MultiIndex
* Index label cells include

* ``row_heading``
* ``row<n>`` where `n` is the numeric position of the row
* ``level<k>`` where `k` is the level in a MultiIndex

* Column label cells include
* ``col_heading``
* ``col<n>`` where `n` is the numeric position of the column
* ``evel<k>`` where `k` is the level in a MultiIndex

* Blank cells include ``blank``
* Data cells include ``data``

See Also
--------
pandas.DataFrame.style
Expand Down Expand Up @@ -110,7 +130,10 @@ class Styler(object):
{% for r in head %}
<tr>
{% for c in r %}
<{{c.type}} class="{{c.class}}">{{c.value}}
{% if c.is_visible != False %}
<{{c.type}} class="{{c.class}}" {{ c.attributes|join(" ") }}>
{{c.value}}
{% endif %}
{% endfor %}
</tr>
{% endfor %}
Expand All @@ -119,8 +142,11 @@ class Styler(object):
{% for r in body %}
<tr>
{% for c in r %}
<{{c.type}} id="T_{{uuid}}{{c.id}}" class="{{c.class}}">
{% if c.is_visible != False %}
<{{c.type}} id="T_{{uuid}}{{c.id}}"
class="{{c.class}}" {{ c.attributes|join(" ") }}>
{{ c.display_value }}
{% endif %}
{% endfor %}
</tr>
{% endfor %}
Expand Down Expand Up @@ -148,7 +174,7 @@ def __init__(self, data, precision=None, table_styles=None, uuid=None,
self.table_styles = table_styles
self.caption = caption
if precision is None:
precision = pd.options.display.precision
precision = get_option('display.precision')
self.precision = precision
self.table_attributes = table_attributes
# display_funcs maps (row, col) -> formatting function
Expand Down Expand Up @@ -177,21 +203,26 @@ def _translate(self):
uuid = self.uuid or str(uuid1()).replace("-", "_")
ROW_HEADING_CLASS = "row_heading"
COL_HEADING_CLASS = "col_heading"
INDEX_NAME_CLASS = "index_name"

DATA_CLASS = "data"
BLANK_CLASS = "blank"
BLANK_VALUE = ""

def format_attr(pair):
return "{key}={value}".format(**pair)

# for sparsifying a MultiIndex
idx_lengths = _get_level_lengths(self.index)
col_lengths = _get_level_lengths(self.columns)

cell_context = dict()

n_rlvls = self.data.index.nlevels
n_clvls = self.data.columns.nlevels
rlabels = self.data.index.tolist()
clabels = self.data.columns.tolist()

idx_values = self.data.index.format(sparsify=False, adjoin=False,
names=False)
idx_values = lzip(*idx_values)

if n_rlvls == 1:
rlabels = [[x] for x in rlabels]
if n_clvls == 1:
Expand All @@ -202,9 +233,24 @@ def _translate(self):
head = []

for r in range(n_clvls):
# Blank for Index columns...
row_es = [{"type": "th",
"value": BLANK_VALUE,
"class": " ".join([BLANK_CLASS])}] * n_rlvls
"display_value": BLANK_VALUE,
"is_visible": True,
"class": " ".join([BLANK_CLASS])}] * (n_rlvls - 1)

# ... except maybe the last for columns.names
name = self.data.columns.names[r]
cs = [BLANK_CLASS if name is None else INDEX_NAME_CLASS,
"level%s" % r]
name = BLANK_VALUE if name is None else name
row_es.append({"type": "th",
"value": name,
"display_value": name,
"class": " ".join(cs),
"is_visible": True})

for c in range(len(clabels[0])):
cs = [COL_HEADING_CLASS, "level%s" % r, "col%s" % c]
cs.extend(cell_context.get(
Expand All @@ -213,16 +259,23 @@ def _translate(self):
row_es.append({"type": "th",
"value": value,
"display_value": value,
"class": " ".join(cs)})
"class": " ".join(cs),
"is_visible": _is_visible(c, r, col_lengths),
"attributes": [
format_attr({"key": "colspan",
"value": col_lengths.get(
(r, c), 1)})
]})
head.append(row_es)

if self.data.index.names and self.data.index.names != [None]:
if self.data.index.names and not all(x is None
for x in self.data.index.names):
index_header_row = []

for c, name in enumerate(self.data.index.names):
cs = [COL_HEADING_CLASS,
"level%s" % (n_clvls + 1),
"col%s" % c]
cs = [INDEX_NAME_CLASS,
"level%s" % c]
name = '' if name is None else name
index_header_row.append({"type": "th", "value": name,
"class": " ".join(cs)})

Expand All @@ -236,12 +289,17 @@ def _translate(self):

body = []
for r, idx in enumerate(self.data.index):
cs = [ROW_HEADING_CLASS, "level%s" % c, "row%s" % r]
cs.extend(
cell_context.get("row_headings", {}).get(r, {}).get(c, []))
# cs.extend(
# cell_context.get("row_headings", {}).get(r, {}).get(c, []))
row_es = [{"type": "th",
"is_visible": _is_visible(r, c, idx_lengths),
"attributes": [
format_attr({"key": "rowspan",
"value": idx_lengths.get((c, r), 1)})
],
"value": rlabels[r][c],
"class": " ".join(cs),
"class": " ".join([ROW_HEADING_CLASS, "level%s" % c,
"row%s" % r]),
"display_value": rlabels[r][c]}
for c in range(len(rlabels[r]))]

Expand Down Expand Up @@ -893,6 +951,40 @@ def _highlight_extrema(data, color='yellow', max_=True):
index=data.index, columns=data.columns)


def _is_visible(idx_row, idx_col, lengths):
"""
Index -> {(idx_row, idx_col): bool})
"""
return (idx_col, idx_row) in lengths


def _get_level_lengths(index):
"""
Given an index, find the level lenght for each element.

Result is a dictionary of (level, inital_position): span
"""
sentinel = com.sentinel_factory()
levels = index.format(sparsify=sentinel, adjoin=False, names=False)

if index.nlevels == 1:
return {(0, i): 1 for i, value in enumerate(levels)}

lengths = {}

for i, lvl in enumerate(levels):
for j, row in enumerate(lvl):
if not get_option('display.multi_sparse'):
lengths[(i, j)] = 1
elif row != sentinel:
last_label = j
lengths[(i, last_label)] = 1
else:
lengths[(i, last_label)] += 1

return lengths


def _maybe_wrap_formatter(formatter):
if is_string_like(formatter):
return lambda x: formatter.format(x)
Expand Down
Loading