Skip to content

PERF: refactor ExcelFormatter #11355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks
chris-b1 opened this issue Oct 17, 2015 · 9 comments
Closed
2 tasks

PERF: refactor ExcelFormatter #11355

chris-b1 opened this issue Oct 17, 2015 · 9 comments
Labels
IO Excel read_excel, to_excel Performance Memory or execution speed performance

Comments

@chris-b1
Copy link
Contributor

xref #11352

@jreback jreback added Performance Memory or execution speed performance IO Excel read_excel, to_excel labels Oct 17, 2015
@jreback jreback added this to the Next Major Release milestone Oct 17, 2015
@jreback
Copy link
Contributor

jreback commented Oct 17, 2015

FYI, you might want to reach out o see if they can input these on a columnar basis to make this faster

@Themanwithoutaplan
Copy link
Contributor

The output is row based so there has to be a conversion at some point from columns to rows. See https://bitbucket.org/snippets/openpyxl/jgbak for a barebones approach.

Will look to add support for Numpy types in a future version of openpyxl and using named styles for faster formatting.

@calvinwyoung
Copy link

Are there any plans to resolve this issue? We'd really love to be able to use constant_memory to output large XLSX files.

@Themanwithoutaplan
Copy link
Contributor

@calvinwyoung this is possible in openpyxl using the dataframe_to_rows() function, which should also work with xlsxwriter. Having looked at the code for the Excelformatter I'm pretty sure that this will always be faster than using to_excel()

@jmcnamara
Copy link
Contributor

@calvinwyoung One problem with using xlsxwriter's constant_memory mode from Pandas is that xlsxwriter doesn't support merged ranges across rows in that mode. As a result it would break the formatting for merged indices. If you specifically need constant_memory support you should probably look at writing the data from the dataframe yourself using xlsxwriter directly.

@calvinwyoung
Copy link

calvinwyoung commented Feb 12, 2018

@Themanwithoutaplan @jmcnamara Thank you both for the feedback. It seems like neither dataframe_to_rows() nor constant_memory supports merged ranges across rows, but this is something I'd like to support if at all possible.

Is there a generally accepted strategy for supporting merged ranges and keeping memory consumption low? For our application, we're okay if the processing takes longer — it just so happens that the machines we're using are constrained by memory, and we aren't able to use swap.

@Themanwithoutaplan
Copy link
Contributor

@calvinwyoung if you want to merge cells when writing straight to XML then you'll have to manage more of this yourself. Apart from the formatting it's not difficult. Details are out of scope for this ticket I think and should be discussed on the openpyxl mailing list.

@calvinwyoung
Copy link

Grea, thank you!

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@mroeschke
Copy link
Member

Seems like there's not been much development here and there doesn't seem to be a clear action item here so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

6 participants