-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Optimize nrows in read_excel #33281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tests which are not connected to that PR fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Can you add tests?
But tests for readers already exist and include header, skiprows and nrows variables. Should I add something more? |
Mind fixing up the code checks @mproszewska and I agree with @mroeschke comments we should find a way to get rid of duplicate code |
The CI is failing because of an unused import of _validate_integer |
Hello @mproszewska! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-06-08 17:08:14 UTC |
pandas/io/excel/_odfreader.py
Outdated
@@ -80,6 +87,16 @@ def get_sheet_data(self, sheet, convert_float: bool) -> List[List[Scalar]]: | |||
table: List[List[Scalar]] = [] | |||
|
|||
for i, sheet_row in enumerate(sheet_rows): | |||
|
|||
should_continue, should_break = self.should_skip_row( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this interface is obvious at all. prefer simple routines
if should_skip_rows(...):
....
when is a break condition? can you just do it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it. Break condition is now handled differently
Can you add a benchmark for this to show performance improvement? |
@mproszewska can you post the ASV results here as a comment? |
|
Can you fix merge master and fix merge conflicts? That may also fix CI Can you post the output of a continuous asv run instead of just the snippet? It should provide results across multiple runs at the end |
@mproszewska can you address comments? |
Closing as I think this is stale but @mproszewska ping if you'd like to pick back up and can fix merge conflicts |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff