-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_excel opimize nrows #32727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Milestone
Comments
take |
5 tasks
5 tasks
reverted |
take |
take |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Code Sample
Problem description
Pandas has option toread only several rows of excel files.
But now it always read all rows and after pandas cut some part.
For example, file have 100 columns and 50k rows, but for test need only first 10 rows.
Now pandas will read to list all 50k rows which use memory and take too many time to read.
les this should explain why the current behaviour is a problem and why the expected output is a better solution.]
Expected Output
Better solution should be read only rows which need for operation.
as I understand there shoul be some changes
pandas/io/excel/_base.py
pandas/io/excel/_base.py
and in files _openpyxl.py, _odfreader.py, _xlrd.py
there should be something like
With this changes read_excel with engine='openpyxl' takes only 5 seconds instead of 50 seconds of current version. And if file will contain 1kk rows, it will take always around 5 seconds, but current version will take tens of minutes.
The text was updated successfully, but these errors were encountered: