-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Use a proper CSS parser for pandas Styler objects #48869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This changes does have 1 sort of regression. The previous latex formatting code was using the CSS styling functions to pass around values, these values weren't properly formatted CSS entries though taking the form Old, only works in v1.5 styler.set_table_styles(
[
{"selector": "toprule", "props": ":hline"},
{"selector": "bottomrule", "props": ":otherline"},
] New, works in both v1.5 and this update styler.set_table_styles(
[
{"selector": "toprule", "props": "value:hline"},
{"selector": "bottomrule", "props": "value:otherline"},
] |
6ea6d61
to
e00f5a7
Compare
@@ -49,6 +49,7 @@ dependencies: | |||
- scipy=1.7.1 | |||
- sqlalchemy=1.4.16 | |||
- tabulate=0.8.9 | |||
- tinycss2=1.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actively maintained? This looks more or less abandoned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you looking at tinycss rather than tinycss2? The later has commits from a few weeks ago on the bleeding edge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked https://github.com/Kozea/tinycss2/commits/master and there are like 25 commits in the last 2 years
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What can I say. There's not much more activity needed than that. It's a simple parser that just produces an AST from some CSS, only a few 100 lines of code if you strip out the comments and doc strings. Doesn't do any of the other fancy processing of interpreting the CSS content. That's all we need here to pull out the key-value pairs being passed in.
Looking into the failing CI, it appears like conda only has version 1.0.2 of the library in question. Going to try and do some local testing before pushing another commit. |
e6bce22
to
6388662
Compare
cc @attack68 I have mixed feeling adding another optional dependency while it does also prevent rolling custom CSS parsing. If we end up spinning of Styler into another package I think it will be easier to curiate optional dependencies #47830 (comment) |
There is a lot of stuff going on here:
I have considered a PR similar to this, myself, in the past so am not averse to it, but, at first glance (and I admit I have not checked thoroughly yet) this PR in its current form seems too invasive to the code. I would consider two aspects of this PR:
Whilst there is some hacky CSS in the code at present, I think we need good examples of how this PR will fundamentally enhance what the user can input and what he get out, if this is implemented. If this ends up being just code cleanup then the additional requirement to install tinycss may be an annoyance to users who simply dont use the hacky css parts anyway. |
@@ -73,6 +73,7 @@ def test_to_read_gcs(gcs_buffer, format): | |||
df1.to_csv(path, index=True) | |||
df2 = read_csv(path, parse_dates=["dt"], index_col=0) | |||
elif format == "excel": | |||
pytest.importorskip("tinycss2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems to be exporting a DataFrame
to Excel, not a Styler
. Do the changes implemented here require a user to have tinycss2
installed even if they are just exporting a DataFrame
to Excel without even bothering with adding styles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the excel writer is implemented there really is no such thing as a truly unstyled output. The regular to_excel function adds a bunch of borders and bold headers. Since there are multiple excel output libraries supported, these borders and other things are added as CSS in the more abstract layers of code. That CSS then gets merged with the user styles when using a Styler. But there is always some CSS being processed when writing to excel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the question was meant more like: would users need to install a new dependency to use to_excel even if they don’t set user configured styles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry my answer wasn't clear, that was the question I was trying to answer.
The normal to_excel function uses CSS under the hood for styling. So yes all excel users would need this additional dependency.
if token.type == "ident": | ||
# The old css parser normalized all identifier values, do | ||
# so here to keep backwards compatibility | ||
token.value = token.lower_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See related discussion here #48660 (comment) regarding case normalization. Seems like its worthy of further investigation of whether this is truly necessary for backwards compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100%, I looked into this back in the 0.24 code. But I believe the reason was to allow users to set a font like Arial
and have it work everywhere. The HTML output expects the upper case version, but the Excel library needs an all lower case version.
Some good points raised.
For your two proposals
This one seems a bit tricky to me. All of the stylers use
I like this one. The only downside I see is that it's now very difficult for a user to know what to do when they have weird CSS that's not being parsed correctly. We could either try and add some smart detection in there and throw some kind of warning. Or add a hard dependency that when using the excel or latex writers you need to have tinycss2 installed. The second option seems like a fair compromise. The regular HTML users can keep using the hacky parser, but for the two output engines that are most likely to have weird CSS we require the better parser.
The specific case that pushed me to add this was I was trying to set a style like |
Came up with a simple benchmark based on some of the existing ones. Tested two cases, a fixed style for all cells and a dynamic style based on the content of the cell. import numpy as np
import pandas as pd
def setup_df():
np.random.seed(1234)
nrows = 500
df2 = pd.DataFrame(np.random.randn(nrows, 10))
df2[0] = pd.period_range("2000", periods=nrows)
df2[1] = range(nrows)
return df2
def benchmark_fixed_style(df2):
style = df2.style.applymap(lambda v: 'value: 1;').to_html()
def benchmark_dynamic_style(df2):
style = df2.style.applymap(lambda v: f'value: {v};').to_html()
if __name__ == '__main__':
import timeit
print(timeit.timeit("benchmark_fixed_style(df2)", setup="from __main__ import benchmark_fixed_style, setup_df; df2 = setup_df()"))
print(timeit.timeit("benchmark_dynamic_style(df2)", setup="from __main__ import benchmark_dynamic_style, setup_df; df2 = setup_df()")) These are the results. If anyone wants to reproduce I recommend lowering the number of trials, the longest trail ran for over 100h while I was on vacation. We are seeing a slow down, but it's not terrible. The biggest risk here is having things feel slow in dynamic environments like notebooks, some quick testing had them feel about the same to me.
|
I've had a think on this. I think using a css parser for this is quite sensible. It would be very nice to fix the I think the new package requirement means that we need to push it to 2.0. There is discussion regarding splitting Styler into its own pandas-styler package. Essentially it means that pandas would have to rely on pandas-styler as a dependency for to_excel. |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
I am still doing some work on this, but at a lower priority in the background since this will be pushed to 2.0. Given the eventual goal of splitting the styling into a second package. I want to to why we need the css parser for all the regular non styled code. Back in the 0.2x days when I last looked at this we passed around css values inside the excel classes to represent the styling on things like headers. But the newest code appears to use a python dict that maps onto the attribute names used by one of the excel libraries. I'm going to poke around and see if I can find if that css dependency on read_excel and write_excel are redundant. That would allow me to avoid having to touch so many test files. |
2.0 will be the next release (after 1.5). But I think what you propose is a good idea. My focus has been on the separate Stytler object to produce HTML, LaTeX (and would like to do JSON also) not Excel. Any Excel work has just been patching and it might not be clean. So appreciate you having a look and coming up with ideas. Separating |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.