Skip to content

Improved benchmark coverage for reading spreadsheets #28230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Sep 5, 2019

Conversation

f6v
Copy link
Contributor

@f6v f6v commented Aug 30, 2019

AFAIK there's no writer for OpenDocument spreadsheet, so I came up with the minimal amount of code to generate the spreadsheet odf engine can read.

@pep8speaks
Copy link

pep8speaks commented Aug 30, 2019

Hello @f6v! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-09-05 12:11:13 UTC

@WillAyd WillAyd added Benchmark Performance (ASV) benchmarks IO Excel read_excel, to_excel labels Aug 30, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

- Added comment in environment.yml
- Added conda-forge to asv config
- Refactored reader benchmark
writer_write = ExcelWriter(bio_write, engine=engine)
self.df.to_excel(writer_write, sheet_name="Sheet1")
writer_write.save()
bio = BytesIO()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for changing this? I think self-contained

Copy link
Contributor Author

@f6v f6v Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming was obsolete. Before the PR, all the setup was done in one function for both read and write benchmarks. Hence the variable names having postfixes _read and _write were justified, but not anymore, I guess?

@f6v f6v force-pushed the improve_benchmark branch from 544f933 to 4a0238f Compare August 30, 2019 20:58
@f6v f6v force-pushed the improve_benchmark branch from f3c1c90 to 05cca05 Compare August 31, 2019 11:35
@f6v f6v force-pushed the improve_benchmark branch from 05cca05 to 7800665 Compare August 31, 2019 11:37
@f6v

This comment has been minimized.

@WillAyd
Copy link
Member

WillAyd commented Sep 2, 2019

If you run black asv_bench/benchmarks/io/excel.py I think will fix it

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. Any chance you can post the results?

@mroeschke who may also have ideas

doc.spreadsheet.addElement(table)
doc.save(self.fname_odf)

def setup(self, engine):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how much time the setup process here takes? Wonder if this should be setup_cache instead to get any write times out of the read benchmark

Copy link
Contributor Author

@f6v f6v Sep 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was north of 1 second, but not more than 1.2, I think. I've replaced setup with setup_cache. Makes much more sense, of course.

@f6v
Copy link
Contributor Author

f6v commented Sep 4, 2019

This is the output:

(pandas-dev) ➜  asv_bench git:(improve_benchmark) asv continuous -f 1.1 upstream/master HEAD -b ^io.excel --show-stderr
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.
·· Building e7d69866 <improve_benchmark> for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt............................................
·· Installing e7d69866 <improve_benchmark> into conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 4 total benchmarks (2 commits * 1 environments * 2 benchmarks)
[  0.00%] · For pandas commit 5f349338 <master> (round 1/2):
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 12.50%] ··· Running (io.excel.ReadExcel.time_read_excel--).
[ 25.00%] ··· Running (io.excel.WriteExcel.time_write_excel--).
[ 25.00%] · For pandas commit e7d69866 <improve_benchmark> (round 1/2):
[ 25.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[ 25.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 37.50%] ··· Running (io.excel.ReadExcel.time_read_excel--).
[ 50.00%] ··· Running (io.excel.WriteExcel.time_write_excel--).
[ 50.00%] · For pandas commit e7d69866 <improve_benchmark> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 62.50%] ··· io.excel.ReadExcel.time_read_excel                                                                                                                                                          ok
[ 62.50%] ··· ========== ==========
                engine
              ---------- ----------
                 xlrd     150±5ms
               openpyxl   254±4ms
                 odf      741±10ms
              ========== ==========

[ 75.00%] ··· io.excel.WriteExcel.time_write_excel                                                                                                                                                        ok
[ 75.00%] ··· ============ =========
                 engine
              ------------ ---------
                openpyxl    422±3ms
               xlsxwriter   258±2ms
                  xlwt      212±1ms
              ============ =========

[ 75.00%] · For pandas commit 5f349338 <master> (round 2/2):
[ 75.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[ 75.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 87.50%] ··· io.excel.ReadExcel.time_read_excel                                                                                                                                                          ok
[ 87.50%] ··· ========== ==========
                engine
              ---------- ----------
                 xlrd     145±1ms
               openpyxl   256±10ms
                 odf      745±6ms
              ========== ==========

[100.00%] ··· io.excel.WriteExcel.time_write_excel                                                                                                                                                        ok
[100.00%] ··· ============ =========
                 engine
              ------------ ---------
                openpyxl    436±6ms
               xlsxwriter   261±3ms
                  xlwt      225±3ms
              ============ =========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @mroeschke

@WillAyd WillAyd added this to the 1.0 milestone Sep 5, 2019
@f6v
Copy link
Contributor Author

f6v commented Sep 5, 2019

Thanks for the help! @WillAyd @mroeschke

@mroeschke mroeschke merged commit 2915223 into pandas-dev:master Sep 5, 2019
@mroeschke
Copy link
Member

Great! Thanks @f6v

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
* Improved benchmark coverage for reading spreadsheets

* Added blank lines

* More blank lines

* Updated whatsnew

* - Removed whatsnew entry
- Added comment in environment.yml
- Added conda-forge to asv config
- Refactored reader benchmark

* Updated requirements-dev.txt

* Fixed imports order

* Fixed imports again

* Run black

* Changed conda channels order in ASV config

* Used setup_cache to speed up read benchmark
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
* Improved benchmark coverage for reading spreadsheets

* Added blank lines

* More blank lines

* Updated whatsnew

* - Removed whatsnew entry
- Added comment in environment.yml
- Added conda-forge to asv config
- Refactored reader benchmark

* Updated requirements-dev.txt

* Fixed imports order

* Fixed imports again

* Run black

* Changed conda channels order in ASV config

* Used setup_cache to speed up read benchmark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ASV Benchmark for read_excel
4 participants