-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: make Styler
compatible with non-unique indexes
#41269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Styler.format
compatible with non-unique indexesStyler
compatible with non-unique indexes
pandas/io/formats/style.py
Outdated
if not c: | ||
continue | ||
css_list = maybe_convert_css_to_tuples(c) | ||
i, j = self.index.get_loc(rn), self.columns.get_loc(cn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can instead we just not use the indices to look up locations? and instead just use indexers (e.g. iterate over the number of columns and use iloc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because attrs
may be a subset
of the main self.data
, so to maintain performance and ensure the mapped css is placed in the right integer locations it needs to lookup - and lookup doesn't work (without ambiguity) for non-unique.
pandas/io/formats/style.py
Outdated
try: | ||
for rn, c in attrs[[cn]].itertuples(): | ||
if not c: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you limit the try/except not to the entire loop here, e.g. just put it around the .get_loc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be even better to simply refuse to render non-uniques entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i removed the separate cases, as suggested.
pandas/io/formats/style.py
Outdated
css_list = maybe_convert_css_to_tuples(c) | ||
i, j = self.index.get_loc(rn), self.columns.get_loc(cn) | ||
self.ctx[(i, j)].extend(css_list) | ||
except ValueError as ve: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to be much more fine grained
pandas/io/formats/style_render.py
Outdated
self._display_funcs[(i, j)] = format_func | ||
for row in data[[col]].itertuples(): | ||
i_ = self.index.get_indexer_for([row[0]]) # handle duplicate keys in | ||
j_ = self.columns.get_indexer_for([col]) # non-unique indexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do this outside of the loop right? (as col doesn't change), for j_
does this change perf at all? (I don't think so, but checking).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.. the multiple loops really killed it for the unique case (which is benchmarked)..
before after ratio
[4af3eed5] [3a8f11e5]
<styler_non_unique~1^2> <styler_non_unique>
+ 57.8±2ms 163±4ms 2.82 io.style.Render.time_format_render(24, 120)
+ 87.1±5ms 242±4ms 2.78 io.style.Render.time_format_render(36, 120)
+ 30.5±0.9ms 82.2±0.4ms 2.69 io.style.Render.time_format_render(12, 120)
+ 8.53±0.08ms 14.7±0.2ms 1.72 io.style.Render.time_format_render(12, 12)
+ 16.2±0.3ms 27.7±0.7ms 1.71 io.style.Render.time_format_render(24, 12)
+ 25.3±1ms 40.9±0.7ms 1.62 io.style.Render.time_format_render(36, 12)
+ 16.6±0.2ms 18.5±0.2ms 1.11 io.style.Render.time_classes_render(36, 12)
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
So I had to separate out the non-unique and unique cases with a conditional, then performance was the same...
BENCHMARKS NOT SIGNIFICANTLY CHANGED.
pandas/io/formats/style_render.py
Outdated
|
||
j = self.columns.get_loc(col) # single value | ||
for row, value in data[[col]].itertuples(): | ||
i = self.index.get_loc(row) # single value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you pull the col indexer out does this still have a perf hit? e.g. get_indexer_for calls get_loc if its unique (which is cached) anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good reviewing! Third time's a charm, the code is simpler, works for non-unique and unique and performance improvement:
before after ratio
[4af3eed5] [c3b7af82]
<text_gradient^2> <styler_non_unique>
- 33.1±0.4ms 25.9±0.5ms 0.78 io.style.Render.time_format_render(12, 120)
- 86.4±0.4ms 66.8±0.5ms 0.77 io.style.Render.time_format_render(36, 120)
- 62.3±2ms 46.4±0.4ms 0.75 io.style.Render.time_format_render(24, 120)
- 10.1±0.3ms 4.16±0.2ms 0.41 io.style.Render.time_format_render(12, 12)
- 24.7±0.2ms 9.53±0.2ms 0.39 io.style.Render.time_format_render(36, 12)
- 17.7±0.2ms 6.76±0.1ms 0.38 io.style.Render.time_format_render(24, 12)
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.
ok! |
The PR aims to make Styler compatible with non-unique indexes/columns, for the purpose of rendering all DataFrame types (even if no styling is applied)
Styler.format
: made FULLY compatible with some modifications to the loops, inc TESTS.Styler.apply
andStyler.applymap
: made PARTIALLY compatible:subset
s are unique then compatible, inc TESTS.ifsubset
s are non-unique slices will raise a not compatibleKeyError
, inc. TESTSStyler.apply
andapplymap
are NOT compatible. Raises KeyError.Styler.set_table_styles
: made FULLY compatible and will style multiple rows/columns from a non-unique key, inc TESTS.Styler.set_td_classes
usesreindex
so is PARTIALLY compatible whereclasses
has unique index/columns: now returns aKeyError
in non-unique case, inc TESTS.Styler.set_tooltips
usesreindex
so is PARTIALLY compatible wherettips
has unique index/columns: now returns aKeyError
in non-unique case, inc TESTS.Styler.hide_index
and.hide_columns
are already FULLY compatible through existing code (inc TESTS)built-in
styling functions use some version ofapply
orapplymap
so are captured by the above cases.I believe this is all relevant functionality reviewed.