ENH: Added xlsxwriter as an ExcelWriter option. #4739

jmcnamara · 2013-09-03T23:03:21Z

Refactored pandas.io.excel.ExcelWriter to allow other
writer engines and added xlsxwriter as an option.
GitHub issue #4542.

Refactored pandas.io.excel.ExcelWriter to allow other writer engines and added xlsxwriter as an option. GitHub issue #4542.

jmcnamara · 2013-09-03T23:04:12Z

If the interface and config item is okay I'll add some documentation additions as well.

jreback · 2013-09-03T23:24:54Z

pandas/core/frame.py

@@ -1385,6 +1386,7 @@ def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='',
            sequence should be given if the DataFrame uses MultiIndex.
        startow : upper left cell row to dump data frame
        startcol : upper left cell column to dump data frame
+        engine : Excel writer class


Add the possibilities, and say that it will try these in order if not specified (and say that it's a string, not a class)

jreback · 2013-09-03T23:33:43Z

Add the dep here as well: http://pandas.pydata.org/pandas-docs/dev/install.html#optional-dependencies

jtratner · 2013-09-03T23:49:18Z

Thanks for submitting this! A few initial notes:

Let's create an 'excel' directory in io/tests and put test_excel.py and test_xlsxwriter.py there.
You should combine all of your test cases into one file instead of multiple. I'm okay putting it in a separate file since it looks like it's quite a few test cases.
If your Xlsxwriter already provides those helper test functions, I would much prefer to import them over including them in the pandas code base. Otherwise it's just a bunch of code we need to maintain in parallel.

jtratner · 2013-09-03T23:54:28Z

Also, in ci/versions - currently the error message for no Xlsxwriter is saying that there is no xlwt. Around line 239 I'm test_excel.py, it looks like skip_if_no_openpyxl is not being called(it's just written in), so it won't actually skip with openpyxl.

jmcnamara · 2013-09-04T00:02:29Z

@jreback @jtratner

multiple similar test classes.

The comparison test classes are logically separate (to me at least) because they are run against separate Excel files and they test different facets of the interface. If they were all in the same file it would become monolithic. For example there is around 400 similar tests in the xlsxwriter test suite.

I'd be happier to put them in a sub directory like the test_json tests.

If your Xlsxwriter already provides those helper test functions

It doesn't, unfortunately. They are only in the repo not in the installed package.

jmcnamara · 2013-09-04T00:07:22Z

currently the error message for no Xlsxwriter is saying that there is no xlwt

Ack. :-| Well spotted.

around line 239 I'm test_excel.py, it looks like skip_if_no_openpyxl is not being called

Yes. That is a typo.

jreback · 2013-09-04T00:08:33Z

@jmcnamara but your xlsxwriter is a unit test suite so of course tests more (and individual items). Here you are reading these files and then just comparing the results vs an expected values (e.g. in theory these should be quite a simple suite, just read in the file with and compare vs other 'known' engines). You could do that actually, e.g. take different frames, write them out with different writers, then read back (individually) and compare with assert_frame_equal

jreback · 2013-09-04T00:12:26Z

@jmcnamara follow up....

it seems that you could just iterate over a list of files and just do this all in 1 file (with separate tests for the engine kw, config and such). you would still have exactly the same tests.

(and if the test frames are different in each case, just put them in a dict or something). Most of the code in the test filess is just boilerplate to setup/cleanup etc.

jtratner · 2013-09-04T00:21:08Z

building on @jreback's comment: also, it's preferable to do it that way because the outputted xlsx file should be correct because xlsxwriter outputs it correctly. If we're able to read it back and it still reads correctly in pandas, then that, fundamentally, demonstrates its correctness, right? The only areas that come to mind where it might make sense to check output is on MultiIndex and hierarchical columns, but those should also be easily round-trippable.

jmcnamara · 2013-09-04T00:29:09Z

@jtratner @jreback

If we're able to read it back and it still reads correctly in pandas, then that, fundamentally, demonstrates its correctness, right?

Maybe. :-)

The tests are generating an Excel file and then comparing that, metadata aside, it is 100% the same as a file created in Excel. It wouldn't be possible to do that with the other engines because they don't aim for fidelity just compatibility. Which is fine.

jmcnamara · 2013-09-04T00:34:54Z

in theory these should be quite a simple suite, just read in the file with and compare vs other 'known' engines)

The tests I added to excel_test.py do that. Or at least compare with the dataframes written out.

jtratner · 2013-09-04T00:35:05Z

I'd still prefer to limit our tests to checking that "When you read in this
file with another [or multiple] readers, it produces exactly the same
DataFrame" and then leave the rest of the testing for the individual excel
writers' test suites.

jmcnamara · 2013-09-04T00:37:00Z

I'd still prefer to limit our tests to checking that "When you read in this
file with another [or multiple] readers, it produces exactly the same
DataFrame" and then leave the rest of the testing for the individual excel
writers' test suites.

That is a fair comment.

I shouldn't be unit testing my module in your testsuite. I'll drop those tests out.

jtratner · 2013-09-04T00:41:14Z

btw @jmcnamara do you mind if I submit a PR to your branch that changes around the implementation of the individual writers? I'm thinking it would make more sense to have each writer register themselves (as classes) on ExcelFile. ExcelFile becomes an abstract base class-like thing and dispatches based upon the engine passed. Then each writer just needs to implement _write_cells and they can work (but they can choose to implement save or overwrite anything else if they want).

I'm happy to do it (especially because I suggested that other way to you). Just would be helpful if you didn't rebase your branch for a while (just add additional commits and then you can squash down at the end).

jmcnamara · 2013-09-04T00:46:54Z

@jtratner

do you mind if I submit a PR to your branch that changes around the implementation of the individual writers?

I don't mind. Refactor as much as you want.

I'll work on the enh_xlsxwriter_dev branch.

jtratner · 2013-09-04T00:50:25Z

@jmcnamara okay, whatever works for you. I'll try to put something together soon...

jmcnamara · 2013-09-15T22:08:02Z

Closing this and opening #4847 based on refactored excel.py.

ENH: Added xlsxwriter as an ExcelWriter option.

1b4972a

Refactored pandas.io.excel.ExcelWriter to allow other writer engines and added xlsxwriter as an option. GitHub issue #4542.

jreback reviewed Sep 3, 2013
View reviewed changes

jmcnamara closed this Sep 15, 2013

jmcnamara mentioned this pull request Sep 15, 2013

ENH: Added xlsxwriter as an ExcelWriter option. #4847

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Added xlsxwriter as an ExcelWriter option. #4739

ENH: Added xlsxwriter as an ExcelWriter option. #4739

jmcnamara commented Sep 3, 2013

jmcnamara commented Sep 3, 2013

jreback Sep 3, 2013

jreback commented Sep 3, 2013

jtratner commented Sep 3, 2013

jtratner commented Sep 3, 2013

jmcnamara commented Sep 4, 2013

jmcnamara commented Sep 4, 2013

jreback commented Sep 4, 2013

jreback commented Sep 4, 2013

jtratner commented Sep 4, 2013

jmcnamara commented Sep 4, 2013

jmcnamara commented Sep 4, 2013

jtratner commented Sep 4, 2013

jmcnamara commented Sep 4, 2013

jtratner commented Sep 4, 2013

jmcnamara commented Sep 4, 2013

jtratner commented Sep 4, 2013

jmcnamara commented Sep 15, 2013

ENH: Added xlsxwriter as an ExcelWriter option. #4739

ENH: Added xlsxwriter as an ExcelWriter option. #4739

Conversation

jmcnamara commented Sep 3, 2013

jmcnamara commented Sep 3, 2013

jreback Sep 3, 2013

Choose a reason for hiding this comment

jreback commented Sep 3, 2013

jtratner commented Sep 3, 2013

jtratner commented Sep 3, 2013

jmcnamara commented Sep 4, 2013

jmcnamara commented Sep 4, 2013

jreback commented Sep 4, 2013

jreback commented Sep 4, 2013

jtratner commented Sep 4, 2013

jmcnamara commented Sep 4, 2013

jmcnamara commented Sep 4, 2013

jtratner commented Sep 4, 2013

jmcnamara commented Sep 4, 2013

jtratner commented Sep 4, 2013

jmcnamara commented Sep 4, 2013

jtratner commented Sep 4, 2013

jmcnamara commented Sep 15, 2013