ENH: Add custom descriptors (such as dtype, nunique, etc.) to Styler output #43894

attack68 · 2021-10-05T21:44:02Z

closes: ENH: Add option to display dtypes below column headers #43875

Former example...

…ndex_hiding # Conflicts: # pandas/tests/io/formats/style/test_to_latex.py

…ding # Conflicts: # pandas/io/formats/style_render.py # pandas/tests/io/formats/style/test_html.py

…ndex_hiding

…classes

pandas/io/formats/style_render.py

mroeschke

I don't use styler much, but if the community has an interest in this feature I think it's worthwhile

attack68 · 2022-01-27T12:47:12Z

@jreback what you think here? OK to merge here on green? I can extend the docs as an add on?

jreback · 2022-01-27T12:48:14Z

no this still has questions

attack68 · 2022-01-28T07:19:25Z

@bashtage I believe the questions jeff is referring to are yours, do you have time to review?

bashtage

I think it is a lot more useful now. The biggest question in this review is what the more API should look like. It seems to be that allowing either Sequence[str | Callable] or dict[str, str | Callable] is more consistent with other parts of pandas that are flexible. I can't think of any cases where tuple(str, str | Callable) would be used to get a string name and it's associated argument.

bashtage · 2022-01-28T09:30:32Z

pandas/io/formats/style_render.py

@@ -124,6 +128,7 @@ def __init__(
        self.hide_columns_: list = [False] * self.columns.nlevels
        self.hidden_rows: Sequence[int] = []  # sequence for specific hidden rows/cols
        self.hidden_columns: Sequence[int] = []
+        self.descriptors: list[str | Callable | tuple[str, Callable]] = []


Is this a common API? I think most places where alternative input are allowed either use a Sequence of some sort or dict[str, str | Callable]. Would be be cleaner if the allowed inputs were Sequence[str | Callable] | dict[str | Callable].

Also, can't the Callable be better described? Isn't is Callable[[Series], float]? It seems that the Callablemust take aSeriesand return some sort of scalar value. Thie question is whether the scalar value can be non-numeric. If it could be any numeric value, than you could you use something likeCallable[[Series], int | float]. So more generally, what are the requirements for the Callable`?

Also, can't the Callable be better described? Isn't is Callable[[Series], float]? It seems that the Callablemust take aSeriesand return some sort of scalar value. Thie question is whether the scalar value can be non-numeric. If it could be any numeric value, than you could you use something likeCallable[[Series], int | float]. So more generally, what are the requirements for the Callable`?

The Callable could techincally return anything so long as the returned object has a __str__ representation to populate in the HTML element. Common values might be int, float, string, but yes it does accept only series.

I think the sig should then be Callable[[Series],Any]

changed typing. But did not add the dict. List[Tuple[str, str]] is present for some of the Styler functions

pandas/io/formats/style_render.py

bashtage · 2022-01-28T09:33:56Z

pandas/io/formats/style_render.py

+        elif isinstance(descriptor, tuple):
+            name, func = descriptor[0], descriptor[1]
+        else:
+            name, func = None, descriptor


When the descriptor is a Callable I think it would be best to try getattr(descriptor, "name", None) for the name rather than committing to a blank name. For example pd.Series.mean.__name__ is mean.

chged. note the case of "<lambda>" i thought was worth blanking

pandas/io/formats/style_render.py

bashtage · 2022-01-28T09:47:58Z

pandas/io/formats/style_render.py

@@ -124,6 +128,7 @@ def __init__(
        self.hide_columns_: list = [False] * self.columns.nlevels
        self.hidden_rows: Sequence[int] = []  # sequence for specific hidden rows/cols
        self.hidden_columns: Sequence[int] = []
+        self.descriptors: list[str | Callable | tuple[str, Callable]] = []


If you do stick with the tuple, it should be tuple(str, str | Callable) which would allow someone to pass a custom name for the output along with a common function, e.g., ("average", "mean"). If moving to a dict, then both {"average":"mean"}and{"average":Series.mean}` should be allowed.

yes, agreed, having a think if this can be improved.

jreback · 2022-01-30T18:42:34Z

I am still -1 here. This is basically completely changing the way things work. We already have .describe() to do exactly this. If you want to add .describe() and then allow it to be more generic ok there. BUT then this adds these as header summary rows, again completely changing things.

I don't have a good recommendation for how to reconcile this. I support if allow a 'header' table would be more ameanable. This just feels tacked on with a lot of bespoke things going on.

Delengowski · 2022-01-30T22:31:44Z

I am still -1 here. This is basically completely changing the way things work. We already have .describe() to do exactly this. If you want to add .describe() and then allow it to be more generic ok there. BUT then this adds these as header summary rows, again completely changing things.

I don't have a good recommendation for how to reconcile this. I support if allow a 'header' table would be more ameanable. This just feels tacked on with a lot of bespoke things going on.

I think I get what you are saying. I think generalizing .describe() would be a smart move, but keep this api, have the actual stylization call the new .describe(). I'm of the opinion that the addition of the data descriptors to the header does belong on styler, but I get your concern of having this extra describe functionality be under styler rather than under .describe() (if that's indeed what you are talking about).

What I don't think should been done is have these generalized functionalities moved to .describe() and then have api be that the end user call .describe() and be forced to pass the resultant DataFrame/Series to some kwarg on styler.

attack68 · 2022-02-21T21:12:21Z

@bashtage @jreback @mroeschke, this PR stalled due to difficulty in finding a way forward. I have tried to refactor it addressing some concerns above in an alternate #46105. If you have time please comment on the alternative.

Delengowski · 2022-02-21T22:30:48Z

I'd be interested in hearing a more detailed response to the opposition of this. The styler is an end use feature, so I'm not sure I understand the issue with there being data descriptors in the header or in a separate footer.

I understand and agree to the issue with duplicating computation logic but not with adding extra rows to the header. Again it's end use, where further computation isnt the primary purpose.

jreback · 2022-02-21T22:42:26Z

api considerations are always paramount - the goal is consistency and simplicity
not convering every possible case

will look soon

mroeschke · 2022-02-21T23:35:06Z

There is a Separation of Concerns argument here: 1) the part to generate the descriptor data & combine with the original data, 2) the part to stylize the result. Without this feature today it's maybe roughly equivalent to (?):

result = pd.concat([original_data.astype(object), descriptor.astype(object)])
output = result.style.(...)

Maybe there should be a better way to do 1 (that's why describe() was brought up, pivot_table has similar functionality)

attack68 · 2022-02-22T05:13:53Z

@mroeschke solution, with the concatenation, might work but it then becomes harder to index the original data, export the styler for more reuseable styling, and apply the builtin formatters since you then have to exclude the concatenated rows from your subset.

I have used his solution before for a "Total" row and it is cumbersome but worked.

bashtage

Some small refactorings to make it simpler, and perhaps more palatable in the long run.

bashtage · 2022-02-22T09:19:46Z

pandas/io/formats/style.py

+        Parameters
+        ----------
+        descriptors : list of str, callables or 2-tuples of str and callable
+            If a string is given must be a valid Series method, e.g. "mean" invokes


Could you some additional details about the callable. Specifically it must (or should) return a scalar quantity. Is the callable allowed to throw and exception? If so, what happens? I would be good to clarify this.

bashtage · 2022-02-22T09:20:34Z

pandas/io/formats/style.py

+        ...     return s.mean()
+        >>> styler = df.style.set_descriptors([
+        ...     "mean",
+        ...     Series.mean,


The screen show shows no lable on this line? Was it not possible to have .__name__ as a default test?

bashtage · 2022-02-22T09:23:31Z

pandas/io/formats/style_render.py

@@ -124,6 +133,7 @@ def __init__(
        self.hide_columns_: list = [False] * self.columns.nlevels
        self.hidden_rows: Sequence[int] = []  # sequence for specific hidden rows/cols
        self.hidden_columns: Sequence[int] = []
+        self.descriptors: list[Descriptor | tuple[str, Descriptor]] = []


Why not make simplify this to self.descriptors: list[tuple[str, Descriptor]]? Also, should it immutable, so that it would be self.descriptors: tuple[tuple[str, Descriptor], ...]?

Essentially if the user doesn't provide a name, then the name used internally, even if blank, is appended so that the tuple form is the only one that would ever need to be used.

bashtage · 2022-02-22T09:25:29Z

pandas/io/formats/style_render.py

@@ -477,6 +494,108 @@ def _generate_col_header_row(self, iter: tuple, max_cols: int, col_lengths: dict

        return index_blanks + column_name + column_headers

+    def _generate_descriptor_row(self, iter: tuple, max_cols: int):


What kind of tuples are allowed? Would be better to be as specific as possible to reduce any future refactor risks.

bashtage · 2022-02-22T09:27:16Z

pandas/io/formats/style_render.py

+            else:
+                func = descriptor[1]
+        else:
+            name, func = getattr(descriptor, "__name__", None), descriptor


Should name be None of just ""? Is there any utility is tracking None? If name is "" then you can simplify the type above to be str rather than str | None.

Or perhaps just use self.css["blank_value"] when missing and simplify below.

bashtage · 2022-02-22T09:28:04Z

pandas/io/formats/style_render.py

+                func = descriptor[1]
+        else:
+            name, func = getattr(descriptor, "__name__", None), descriptor
+            name = None if name == "<lambda>" else name  # blank nameless functions


Same here. Replace nameless -> anonymous in comment

bashtage · 2022-02-22T09:31:17Z

pandas/io/formats/style_render.py

+        )
+
+        base_css = f"{self.css['descriptor_name']} {self.css['descriptor']}{r}"
+        if name is not None and not self.hide_column_names:


The above refactor removes this if then since name is always assigned.

bashtage · 2022-02-22T09:37:35Z

I've some to the point where the LOC is now small enough that it is worth including for the flexibility it brings. The hard part is always expanding the API as pandas as become more mature, as this requires some effort (although I would rate this as highly stable, and so low maintenance).

Ideally, there would be a way to find out if this feature was in fact widely desirable before committing to it for a fairly long period of time.

attack68 · 2022-02-25T18:53:37Z

I've some to the point where the LOC is now small enough that it is worth including for the flexibility it brings. The hard part is always expanding the API as pandas as become more mature, as this requires some effort (although I would rate this as highly stable, and so low maintenance).

Ideally, there would be a way to find out if this feature was in fact widely desirable before committing to it for a fairly long period of time.

Thanks @bashtage for the review and I agree with all your points, but I think Jeff was very keen on completely separating the calculation and thereby reducing the LOC much more, which I also agree with, but I was struggling to think of a way.

Finally, I recently had another thought and managed to completely reengineer this with the execution into less than 20 LOC, with better formatting and HTML styling built in, so I'm closing to close this and leave the superior, #46105, open. (thanks also @mroeschke for review)

attack68 added 30 commits September 18, 2021 20:54

ignore hidden rows in loop

9743099

add latex 43644 test

8a0253e

add latex 43644 test

74c418e

Merge remote-tracking branch 'upstream/master' into bug_styler_multii…

70535c5

…ndex_hiding # Conflicts: # pandas/tests/io/formats/style/test_to_latex.py

clean up code

2fbe569

clean up code

7903723

row, col and level css

6a2793c

whats new

d227914

tests and user guide

7ca5002

merge

f27f7ed

merge

a1000a7

move private methods

c44dcda

Merge branch 'clean_styler_css_classes' into bug_styler_multiindex_hi…

c4c9aaa

…ding # Conflicts: # pandas/io/formats/style_render.py # pandas/tests/io/formats/style/test_html.py

refactor methods

c22cf0d

add test for 43703

0e0b46e

add test for 43703

1f3bbec

docs

021bc26

docs

4ba3dff

more explicit test

7fee05d

more explicit test

baa3233

fix checks

f4ad390

fix checks

22b03e3

fix checks

db214d8

Merge remote-tracking branch 'upstream/master' into bug_styler_multii…

771d056

…ndex_hiding

fix checks

24952ae

Merge remote-tracking branch 'upstream/master' into clean_styler_css_…

1d47d0f

…classes

fix checks

566738d

fix checks

230138a

refactor to get tests to pass

b9ba9ea

refactor to get tests to pass

ea2bba1

mroeschke reviewed Jan 26, 2022

View reviewed changes

pandas/io/formats/style_render.py Outdated Show resolved Hide resolved

mroeschke approved these changes Jan 26, 2022

View reviewed changes

attack68 added 2 commits January 26, 2022 18:18

Merge remote-tracking branch 'upstream/main' into describe_styler

d8e11c8

is_integer replacemnet

4f935c4

bashtage requested changes Jan 28, 2022

View reviewed changes

attack68 added 7 commits January 28, 2022 18:24

Merge remote-tracking branch 'upstream/main' into describe_styler

7031897

bastage req: refactor _element

0c034a6

bashtage req: is_float / complex

6128ccb

bashtage req: is_float / complex

bb8201e

bashtage req: __name__ and typing

44204e0

bashtage req: doc updates for __name__

ebec1d6

doc fix

c205c38

attack68 mentioned this pull request Jan 31, 2022

ENH: DataFrame.describe allows UDFs and/or selectable metrics #45737

Closed

attack68 marked this pull request as draft February 21, 2022 06:18

attack68 mentioned this pull request Feb 21, 2022

ENH: allow concat of Styler objects #46105

Merged

2 tasks

bashtage requested changes Feb 22, 2022

View reviewed changes

attack68 closed this Feb 25, 2022

attack68 deleted the describe_styler branch March 6, 2022 07:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add custom descriptors (such as dtype, nunique, etc.) to Styler output #43894

ENH: Add custom descriptors (such as dtype, nunique, etc.) to Styler output #43894

attack68 commented Oct 5, 2021 •

edited

Loading

mroeschke left a comment

attack68 commented Jan 27, 2022

jreback commented Jan 27, 2022

attack68 commented Jan 28, 2022

bashtage left a comment

bashtage Jan 28, 2022

attack68 Jan 28, 2022

bashtage Jan 28, 2022

attack68 Jan 29, 2022

bashtage Jan 28, 2022

attack68 Jan 28, 2022

bashtage Jan 28, 2022

attack68 Jan 28, 2022

jreback commented Jan 30, 2022

Delengowski commented Jan 30, 2022

attack68 commented Feb 21, 2022

Delengowski commented Feb 21, 2022

jreback commented Feb 21, 2022

mroeschke commented Feb 21, 2022

attack68 commented Feb 22, 2022

bashtage left a comment

bashtage Feb 22, 2022

bashtage Feb 22, 2022

bashtage Feb 22, 2022

bashtage Feb 22, 2022

bashtage Feb 22, 2022

bashtage Feb 22, 2022

bashtage Feb 22, 2022

bashtage Feb 22, 2022

bashtage commented Feb 22, 2022

attack68 commented Feb 25, 2022

		@@ -477,6 +494,108 @@ def _generate_col_header_row(self, iter: tuple, max_cols: int, col_lengths: dict

		return index_blanks + column_name + column_headers

		def _generate_descriptor_row(self, iter: tuple, max_cols: int):

ENH: Add custom descriptors (such as dtype, nunique, etc.) to Styler output #43894

ENH: Add custom descriptors (such as dtype, nunique, etc.) to Styler output #43894

Conversation

attack68 commented Oct 5, 2021 • edited Loading

mroeschke left a comment

Choose a reason for hiding this comment

attack68 commented Jan 27, 2022

jreback commented Jan 27, 2022

attack68 commented Jan 28, 2022

bashtage left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jan 30, 2022

Delengowski commented Jan 30, 2022

attack68 commented Feb 21, 2022

Delengowski commented Feb 21, 2022

jreback commented Feb 21, 2022

mroeschke commented Feb 21, 2022

attack68 commented Feb 22, 2022

bashtage left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bashtage commented Feb 22, 2022

attack68 commented Feb 25, 2022

attack68 commented Oct 5, 2021 •

edited

Loading