Improved benchmark coverage for reading spreadsheets #28230

f6v · 2019-08-30T10:40:51Z

closes ASV Benchmark for read_excel #27485
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff

AFAIK there's no writer for OpenDocument spreadsheet, so I came up with the minimal amount of code to generate the spreadsheet odf engine can read.

pep8speaks · 2019-08-30T10:40:54Z

Hello @f6v! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-09-05 12:11:13 UTC

WillAyd

Thanks for the PR!

doc/source/whatsnew/v0.25.2.rst

environment.yml

asv_bench/benchmarks/io/excel.py

asv_bench/asv.conf.json

- Added comment in environment.yml - Added conda-forge to asv config - Refactored reader benchmark

WillAyd · 2019-08-30T18:35:51Z

asv_bench/benchmarks/io/excel.py

-        writer_write = ExcelWriter(bio_write, engine=engine)
-        self.df.to_excel(writer_write, sheet_name="Sheet1")
-        writer_write.save()
+        bio = BytesIO()


Any reason for changing this? I think self-contained

The naming was obsolete. Before the PR, all the setup was done in one function for both read and write benchmarks. Hence the variable names having postfixes _read and _write were justified, but not anymore, I guess?

asv_bench/benchmarks/io/excel.py

…e_benchmark

WillAyd · 2019-09-02T20:21:42Z

If you run black asv_bench/benchmarks/io/excel.py I think will fix it

WillAyd

I think this looks good. Any chance you can post the results?

@mroeschke who may also have ideas

WillAyd · 2019-09-03T16:06:50Z

asv_bench/benchmarks/io/excel.py

+        doc.spreadsheet.addElement(table)
+        doc.save(self.fname_odf)
+
+    def setup(self, engine):


Do you know how much time the setup process here takes? Wonder if this should be setup_cache instead to get any write times out of the read benchmark

Was north of 1 second, but not more than 1.2, I think. I've replaced setup with setup_cache. Makes much more sense, of course.

asv_bench/asv.conf.json

asv_bench/benchmarks/io/excel.py

f6v · 2019-09-04T07:04:52Z

This is the output:

(pandas-dev) ➜  asv_bench git:(improve_benchmark) asv continuous -f 1.1 upstream/master HEAD -b ^io.excel --show-stderr
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.
·· Building e7d69866 <improve_benchmark> for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt............................................
·· Installing e7d69866 <improve_benchmark> into conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
· Running 4 total benchmarks (2 commits * 1 environments * 2 benchmarks)
[  0.00%] · For pandas commit 5f349338 <master> (round 1/2):
[  0.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[  0.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 12.50%] ··· Running (io.excel.ReadExcel.time_read_excel--).
[ 25.00%] ··· Running (io.excel.WriteExcel.time_write_excel--).
[ 25.00%] · For pandas commit e7d69866 <improve_benchmark> (round 1/2):
[ 25.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[ 25.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 37.50%] ··· Running (io.excel.ReadExcel.time_read_excel--).
[ 50.00%] ··· Running (io.excel.WriteExcel.time_write_excel--).
[ 50.00%] · For pandas commit e7d69866 <improve_benchmark> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 62.50%] ··· io.excel.ReadExcel.time_read_excel                                                                                                                                                          ok
[ 62.50%] ··· ========== ==========
                engine
              ---------- ----------
                 xlrd     150±5ms
               openpyxl   254±4ms
                 odf      741±10ms
              ========== ==========

[ 75.00%] ··· io.excel.WriteExcel.time_write_excel                                                                                                                                                        ok
[ 75.00%] ··· ============ =========
                 engine
              ------------ ---------
                openpyxl    422±3ms
               xlsxwriter   258±2ms
                  xlwt      212±1ms
              ============ =========

[ 75.00%] · For pandas commit 5f349338 <master> (round 2/2):
[ 75.00%] ·· Building for conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt..
[ 75.00%] ·· Benchmarking conda-py3.6-Cython-matplotlib-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt
[ 87.50%] ··· io.excel.ReadExcel.time_read_excel                                                                                                                                                          ok
[ 87.50%] ··· ========== ==========
                engine
              ---------- ----------
                 xlrd     145±1ms
               openpyxl   256±10ms
                 odf      745±6ms
              ========== ==========

[100.00%] ··· io.excel.WriteExcel.time_write_excel                                                                                                                                                        ok
[100.00%] ··· ============ =========
                 engine
              ------------ ---------
                openpyxl    436±6ms
               xlsxwriter   261±3ms
                  xlwt      225±3ms
              ============ =========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

WillAyd

lgtm @mroeschke

f6v · 2019-09-05T17:09:22Z

Thanks for the help! @WillAyd @mroeschke

mroeschke · 2019-09-05T17:11:03Z

Great! Thanks @f6v

* Improved benchmark coverage for reading spreadsheets * Added blank lines * More blank lines * Updated whatsnew * - Removed whatsnew entry - Added comment in environment.yml - Added conda-forge to asv config - Refactored reader benchmark * Updated requirements-dev.txt * Fixed imports order * Fixed imports again * Run black * Changed conda channels order in ASV config * Used setup_cache to speed up read benchmark

Improved benchmark coverage for reading spreadsheets

b0ffdc9

f6v added 3 commits August 30, 2019 12:42

Added blank lines

7c8c1f3

More blank lines

6e421e3

Updated whatsnew

f7500c6

f6v mentioned this pull request Aug 30, 2019

ASV Benchmark for read_excel #27485

Closed

WillAyd added Benchmark Performance (ASV) benchmarks IO Excel read_excel, to_excel labels Aug 30, 2019

WillAyd requested changes Aug 30, 2019

View reviewed changes

doc/source/whatsnew/v0.25.2.rst Outdated Show resolved Hide resolved

environment.yml Outdated Show resolved Hide resolved

asv_bench/benchmarks/io/excel.py Outdated Show resolved Hide resolved

asv_bench/asv.conf.json Outdated Show resolved Hide resolved

- Removed whatsnew entry

9dcf7a9

- Added comment in environment.yml - Added conda-forge to asv config - Refactored reader benchmark

WillAyd reviewed Aug 30, 2019

View reviewed changes

f6v added 2 commits August 30, 2019 20:55

Updated requirements-dev.txt

1354f04

Fixed imports order

4a0238f

f6v force-pushed the improve_benchmark branch from 544f933 to 4a0238f Compare August 30, 2019 20:58

f6v and others added 2 commits August 30, 2019 22:59

Merge branch 'master' into improve_benchmark

682711b

Fixed imports again

96912d6

f6v force-pushed the improve_benchmark branch from f3c1c90 to 05cca05 Compare August 31, 2019 11:35

Merge branch 'improve_benchmark' of github.com:f6v/pandas into improv…

7800665

…e_benchmark

f6v force-pushed the improve_benchmark branch from 05cca05 to 7800665 Compare August 31, 2019 11:37

This comment has been minimized.

Sign in to view

Run black

e7d6986

WillAyd reviewed Sep 3, 2019

View reviewed changes

asv_bench/asv.conf.json Outdated Show resolved Hide resolved

mroeschke reviewed Sep 3, 2019

View reviewed changes

asv_bench/benchmarks/io/excel.py Outdated Show resolved Hide resolved

f6v added 2 commits September 4, 2019 09:05

Changed conda channels order in ASV config

e7279e5

Used setup_cache to speed up read benchmark

a4b3cc2

WillAyd approved these changes Sep 5, 2019

View reviewed changes

WillAyd added this to the 1.0 milestone Sep 5, 2019

mroeschke merged commit 2915223 into pandas-dev:master Sep 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved benchmark coverage for reading spreadsheets #28230

Improved benchmark coverage for reading spreadsheets #28230

f6v commented Aug 30, 2019 •

edited

Loading

pep8speaks commented Aug 30, 2019 •

edited

Loading

WillAyd left a comment

WillAyd Aug 30, 2019

f6v Aug 30, 2019 •

edited

Loading

This comment has been minimized.

WillAyd commented Sep 2, 2019

WillAyd left a comment

WillAyd Sep 3, 2019

f6v Sep 5, 2019 •

edited

Loading

f6v commented Sep 4, 2019

WillAyd left a comment

f6v commented Sep 5, 2019

mroeschke commented Sep 5, 2019

Improved benchmark coverage for reading spreadsheets #28230

Improved benchmark coverage for reading spreadsheets #28230

Conversation

f6v commented Aug 30, 2019 • edited Loading

pep8speaks commented Aug 30, 2019 • edited Loading

Comment last updated at 2019-09-05 12:11:13 UTC

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Aug 30, 2019

Choose a reason for hiding this comment

f6v Aug 30, 2019 • edited Loading

Choose a reason for hiding this comment

This comment has been minimized.

WillAyd commented Sep 2, 2019

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Sep 3, 2019

Choose a reason for hiding this comment

f6v Sep 5, 2019 • edited Loading

Choose a reason for hiding this comment

f6v commented Sep 4, 2019

WillAyd left a comment

Choose a reason for hiding this comment

f6v commented Sep 5, 2019

mroeschke commented Sep 5, 2019

f6v commented Aug 30, 2019 •

edited

Loading

pep8speaks commented Aug 30, 2019 •

edited

Loading

f6v Aug 30, 2019 •

edited

Loading

f6v Sep 5, 2019 •

edited

Loading