-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: read_excel blows the memory when using openpyxl engine #40569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@liyucheng09 Please provide the output of It shows other helpful information besides just the version of pandas 👍 |
@nmay231 thanks for your reply. the outputs of
|
I wonder if this is has been fixed by #39547. Spreadsheets with a blank cell on the last row or column of the worksheet take vastly longer to load in 1.2.x than in 1.1.x due to the trailing rows of NaNs. |
@ahawryluk Thanks for the reply. After upgrading pandas, it works well, so I will close it. |
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
I am not quite sure how to describe the bug, the code just got stuck when I run
pd.read_excel('full_data.xlsx')
. I found this line cost a significant amount of memories (almost14G
but the.xlsx
file is just9MB
).I speculate it is result from
read_excel
now leverageopenpyxl
as default engine inpython3.9
. Loading this file inpython3.8
works fine.The above codes also leads to the same issue.
The text was updated successfully, but these errors were encountered: