Skip to content

ENH: Allow to assign/create custom body_style and header_style property to an instance of ExcelFormatter used in method df.to_excel() #52369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 3 tasks
augustosisa opened this issue Apr 2, 2023 · 7 comments
Assignees
Labels
Enhancement IO Excel read_excel, to_excel

Comments

@augustosisa
Copy link

augustosisa commented Apr 2, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I would like to assign a basic custom style for DataFrame body when it is exported to Excel with the method pandas.DataFrame.to_excel(). I know that there are a vast number of alternatives using stylers but those use large libraries such as matplotlib to produce a simple css style, for example, if the user just needs to add borders or bold text or alignment.

Currently, the method pandas.DataFrame.to_excel() uses by default for any DataFrame a style with borders and bold for headers and indexes, and no style for the rest of the body.

Feature Description

There are many alternatives. For me the shortest and easiest one with the less number of changes in the current code will be:

According to the source code of the current dev-version, the method pandas.DataFrame.to_excel() uses an instance of a class ExcelFormatter to format cells and then writes them to a file. This class is defined in file pandas/io/formats/excel.py and its method .write() calls the method .get_formatted_cells() and it in its turn calls the private method ._format_body() and after some more calls the method ._generate_body() is reached. This method formats series by series and by default uses a style=None which is hard coded.

# Final part of current `ExcelFormatter._generate_body()`
yield CssExcelCell(
        row=self.rowcounter + i,
        col=colidx + coloffset,
        val=val,
        style=None,
        css_styles=getattr(self.styler, "ctx", None),
        css_row=i,
        css_col=colidx,
        css_converter=self.style_converter,
        )

I would like to change that None default value to a new ExcelFormatter property similar to the current header_style used to format headers and indexes. That needs to add a new property and change the call to CssExcelCell in line ~ 882 in file pandas/io/formats/excel.py as I show in this shortcode

# ExcelFormatter properties
@property
def header_style(self) -> dict[str, dict[str, str | bool]]:
    return {
        "font": {"bold": True},
        "borders": {
            "top": "thin",
            "right": "thin",
            "bottom": "thin",
            "left": "thin",
        },
        "alignment": {"horizontal": "center", "vertical": "top"},
    }
# -- >  NEW PROPERTY < ---
@property
def body_style(self) -> dict[str, dict[str, str | bool]]:
    return {
        "font": {"bold": False},
        "borders": {
            "top": "thin",
            "right": "thin",
            "bottom": "thin",
            "left": "thin",
        },
        "alignment": {"horizontal": "right", "vertical": "top"},
    }
# -- > END  NEW PROPERTY < ---

# Call in `ExcelFormatter._generate_body()` to ``CssExcelCell`
```python
yield CssExcelCell(
     # ...
     style=self.body_style
     # ...
    )

I don't know if there is a better way to set those properties in pandas, but for my purposes, I can build a new style as a dict in my code and assign my new styles to the properties or set to None before calling the constructor of ExcelFormatter in the method.

pd.io.formats.excel.ExcelFormatter.header_style = style_header
pd.io.formats.excel.ExcelFormatter.body_style = style_body

df.to_excel(writer, sheet_name=sheet_name)

A more sophisticated alternative will included new arguments with custom styles for the method ``pandas.DataFrame.to_excel()` and changes in the way those styles are parsed by ExcelFormatter instances, but in the end maybe look similar to the shortest alternative that I describe above.

Alternative Solutions

As an alternative users can use Stylers in DataFrame. This alternative requires matplotlib, jinja2.

Additional Context

I already tested these small changes in my code in an environment with pandas (version 1.4.2) and xlsxwriter I would like to push those changes for a more recent version. But, in virtue that I don't know which tests are usually made by the developers' team related to complex DataFrames or performance, I decided to launch this request.

@augustosisa augustosisa added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 2, 2023
@attack68
Copy link
Contributor

attack68 commented Apr 3, 2023

Additional discussion here #50336

@augustosisa
Copy link
Author

Additional discussion here #50336

I agree, Using setters for both properties is even better.

The initial author of discussion (#50336) states that PyCharm launches a message when a user tries to set directly the property. In my case, I am using VSCode, it works for me as long as I set the new property before calling the ExcelFormatter constructor.

I prefer to allow the user to get or set both properties as he/she wants. Using instance attributes over class attributes.

@rhshadrach rhshadrach added the IO Excel read_excel, to_excel label Apr 3, 2023
@attack68
Copy link
Contributor

attack68 commented Apr 4, 2023

I think are enough issues and core dev interaction for this to be proposed, reviewed and ultimately accepted.
I'm happy with the suggestion here taking inspiratrion from 50336, and I would ideally look for two things;

@attack68 attack68 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Apr 4, 2023
@rmhowe425 rmhowe425 removed their assignment May 28, 2023
@rmhowe425
Copy link
Contributor

@augustosisa Are you working on this GH issue? I see in your initial message that you had a working solution. Were you planning on opening a PR?

@rmhowe425
Copy link
Contributor

take

@rmhowe425
Copy link
Contributor

@attack68 @rhshadrach

Given that we merged the PR for #54154 , I'm wondering if we can close out this issue?

@attack68
Copy link
Contributor

We have addressed the part of this which deals with clearing the underlying formatting from DataFrame.to_excel.

I believe this now provides sufficient flexibility to control the output of an excel file from Pandas. This can be reopened if felt otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Excel read_excel, to_excel
Projects
None yet
4 participants