-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Improved benchmark coverage for reading spreadsheets #28230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
- Added comment in environment.yml - Added conda-forge to asv config - Refactored reader benchmark
writer_write = ExcelWriter(bio_write, engine=engine) | ||
self.df.to_excel(writer_write, sheet_name="Sheet1") | ||
writer_write.save() | ||
bio = BytesIO() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for changing this? I think self-contained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming was obsolete. Before the PR, all the setup was done in one function for both read and write benchmarks. Hence the variable names having postfixes _read
and _write
were justified, but not anymore, I guess?
This comment has been minimized.
This comment has been minimized.
If you run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. Any chance you can post the results?
@mroeschke who may also have ideas
asv_bench/benchmarks/io/excel.py
Outdated
doc.spreadsheet.addElement(table) | ||
doc.save(self.fname_odf) | ||
|
||
def setup(self, engine): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how much time the setup process here takes? Wonder if this should be setup_cache
instead to get any write times out of the read benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was north of 1 second, but not more than 1.2, I think. I've replaced setup
with setup_cache
. Makes much more sense, of course.
This is the output:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @mroeschke
Thanks for the help! @WillAyd @mroeschke |
Great! Thanks @f6v |
* Improved benchmark coverage for reading spreadsheets * Added blank lines * More blank lines * Updated whatsnew * - Removed whatsnew entry - Added comment in environment.yml - Added conda-forge to asv config - Refactored reader benchmark * Updated requirements-dev.txt * Fixed imports order * Fixed imports again * Run black * Changed conda channels order in ASV config * Used setup_cache to speed up read benchmark
* Improved benchmark coverage for reading spreadsheets * Added blank lines * More blank lines * Updated whatsnew * - Removed whatsnew entry - Added comment in environment.yml - Added conda-forge to asv config - Refactored reader benchmark * Updated requirements-dev.txt * Fixed imports order * Fixed imports again * Run black * Changed conda channels order in ASV config * Used setup_cache to speed up read benchmark
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
AFAIK there's no writer for OpenDocument spreadsheet, so I came up with the minimal amount of code to generate the spreadsheet
odf
engine can read.