Skip to content

to_excel cannot handle large sheet #26051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Pzoom522 opened this issue Apr 11, 2019 · 14 comments · Fixed by #26080
Closed

to_excel cannot handle large sheet #26051

Pzoom522 opened this issue Apr 11, 2019 · 14 comments · Fixed by #26080
Labels
Error Reporting Incorrect or improved errors from pandas good first issue IO Excel read_excel, to_excel
Milestone

Comments

@Pzoom522
Copy link

Pzoom522 commented Apr 11, 2019

Problem description

For extremely large sheets (row_num > 1048576, col_num > 16384), see the code of XlsxWriter Engine, area which exceeds the limit won't be printed.
However, no error or warning is raised.

Expected Output

This sheet is too large! Area off (1048576, 1638) will not be printed.

Output of pd.show_versions()

latest (v0.24.2)

@chris-b1
Copy link
Contributor

A little strange that nothing happens on the xlsxwriter side, but yes would definitely take a PR that guards and raises on our end

@chris-b1 chris-b1 added Effort Low Error Reporting Incorrect or improved errors from pandas good first issue IO Excel read_excel, to_excel labels Apr 11, 2019
@chris-b1 chris-b1 added this to the Contributions Welcome milestone Apr 11, 2019
@WillAyd
Copy link
Member

WillAyd commented Apr 11, 2019

I think this makes more sense as an enhancement request for xlsxwriter that would just pass through here rather than doing it on our end

@anordin95
Copy link

I'm new to pandas-dev. Does the closed tag on the feature request I see above indicate the work is already done and this entire issue should be closed?

@chris-b1
Copy link
Contributor

chris-b1 commented Apr 12, 2019

I think we could handle on the pandas side - Xlsxwriter's interface is essentially cell-based - there isn't a way to know in advance if those limits are going to be broken. Because pandas know the total table size in advance, I think it makes sense to check and raise.

@anordin95
Copy link

I agree @chris-b1. I assume this hasn't been addressed, but I'm uncertain due to the red "Closed" indicator I see beside the feature request above my original comment.

@chris-b1
Copy link
Contributor

chris-b1 commented Apr 12, 2019

That closed is from the linked issue in the xlsxwriter repo

@anordin95
Copy link

Ah, makes sense. Thank you!

@anordin95
Copy link

Going to raise a ValueError unless anyone has a better suggestion.

@anordin95
Copy link

I believe xlsx writer hangs when passed a sheet that is too large. I test whether calling df.to_excel raises a ValueError when given a sheet that is too large. In that case, the test passes. However, if too large of a sheet is passed, and the code to check input size is not there, the to_excel call will hang resulting in the test stalling. Any ideas on how to best proceed? I believe the best course would be to ignore testing the new checking logic, because adding the test I previously described would just add unhelpful code.

@anordin95
Copy link

Solved.

@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 May 16, 2019
@anubhav0fnu
Copy link

@anordin95, it still fails,

  File "//anaconda3/envs/proteomics/lib/python3.7/site-packages/pandas/core/generic.py", line 2180, in to_excel
    engine=engine,
  File "//anaconda3/envs/proteomics/lib/python3.7/site-packages/pandas/io/formats/excel.py", line 719, in write
    f"This sheet is too large! Your sheet size is: {num_rows}, {num_cols} "
ValueError: This sheet is too large! Your sheet size is: 1209019, 51 Max sheet size is: 1048576, 16384

Any suggestions on increasing the limit to say in python sys.maxsize? so that it should only fails once the system's m/m get's bloated! this way only User has to handle that and not the to_excel().?

@anordin95
Copy link

This looks like it should fail. The maximum sheet size in excel is being exceeded. You have ~1.2M rows and excel can only handle ~1M.

https://support.office.com/en-us/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3

I'm not sure what m/m means. Writing your data to another format may better suit your usage, such as hdf.

@Pzoom522
Copy link
Author

Pzoom522 commented Oct 5, 2020

Had the UK's public health system seen this issue...
https://www.bbc.co.uk/news/uk-54422505

@ghost
Copy link

ghost commented Aug 17, 2021

is there a way to auto add new sheet in pandas.ExcelWriter if a sheet reaches maximum limit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas good first issue IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants