Skip to content

PERF: Excel Styler treatment of CSS side expansion is slow #47352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
tehunter opened this issue Jun 14, 2022 · 2 comments · Fixed by #47371
Closed
3 tasks done

PERF: Excel Styler treatment of CSS side expansion is slow #47352

tehunter opened this issue Jun 14, 2022 · 2 comments · Fixed by #47371
Labels
IO Excel read_excel, to_excel Needs Triage Issue that has not been reviewed by a pandas team member Performance Memory or execution speed performance

Comments

@tehunter
Copy link
Contributor

tehunter commented Jun 14, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this issue exists on the latest version of pandas.

  • I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

The styler approach to resolving CSS string expansion for conversion to Excel is slow. In particular:

  • atomize contains an inefficient string addition and replace.
  • No caching is conducted by CSSResolver or CSSToExcelConverter
from pathlib import Path

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(0, 100, (10, 1000)))
style = df.style
style.applymap(lambda x: "border-color: black; border-style: solid; border-width: thin")
style.to_excel(Path(__file__).parent / "test.xlsx")

On my machine, this example took 12.2s to run. Of that time, 6.57s was cumulatively spent in css.py:CSSResolver.__call__, including 2.37s in parse and 1.5s in atomize.

Installed Versions

Replace this line with the output of pd.show_versions()

Prior Performance

No response

@tehunter tehunter added Needs Triage Issue that has not been reviewed by a pandas team member Performance Memory or execution speed performance labels Jun 14, 2022
@attack68
Copy link
Contributor

The CSS resolving has always been an additional hack tagged on to the excel conversion.

If changed, there were some particular orders of calculation that might make some of these things a bit tricky to implement. IIRC there are some tests that might catch these items I am referring to, hopefully. Otherwise.. contributions welcome :)

@phofl phofl added the IO Excel read_excel, to_excel label Jun 14, 2022
@tehunter
Copy link
Contributor Author

Put together a PR to address some of the issues. Had to modify some of the argument types, but I think I eliminated a lot of the unnecessary string joining and parsing. Also added Python's functools.lru_cache wrapper to the CSSToExcelConverter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Needs Triage Issue that has not been reviewed by a pandas team member Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants