Skip to content

ENH: Added xlsxwriter as an ExcelWriter option. #4739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

ENH: Added xlsxwriter as an ExcelWriter option. #4739

wants to merge 1 commit into from

Conversation

jmcnamara
Copy link
Contributor

Refactored pandas.io.excel.ExcelWriter to allow other
writer engines and added xlsxwriter as an option.
GitHub issue #4542.

Refactored pandas.io.excel.ExcelWriter to allow other
writer engines and added xlsxwriter as an option.
GitHub issue #4542.
@jmcnamara
Copy link
Contributor Author

If the interface and config item is okay I'll add some documentation additions as well.

@@ -1385,6 +1386,7 @@ def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='',
sequence should be given if the DataFrame uses MultiIndex.
startow : upper left cell row to dump data frame
startcol : upper left cell column to dump data frame
engine : Excel writer class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the possibilities, and say that it will try these in order if not specified (and say that it's a string, not a class)

@jreback
Copy link
Contributor

jreback commented Sep 3, 2013

@jtratner
Copy link
Contributor

jtratner commented Sep 3, 2013

Thanks for submitting this! A few initial notes:

  1. Let's create an 'excel' directory in io/tests and put test_excel.py and test_xlsxwriter.py there.
  2. You should combine all of your test cases into one file instead of multiple. I'm okay putting it in a separate file since it looks like it's quite a few test cases.
  3. If your Xlsxwriter already provides those helper test functions, I would much prefer to import them over including them in the pandas code base. Otherwise it's just a bunch of code we need to maintain in parallel.

@jtratner
Copy link
Contributor

jtratner commented Sep 3, 2013

Also, in ci/versions - currently the error message for no Xlsxwriter is saying that there is no xlwt. Around line 239 I'm test_excel.py, it looks like skip_if_no_openpyxl is not being called(it's just written in), so it won't actually skip with openpyxl.

@jmcnamara
Copy link
Contributor Author

@jreback @jtratner

multiple similar test classes.

The comparison test classes are logically separate (to me at least) because they are run against separate Excel files and they test different facets of the interface. If they were all in the same file it would become monolithic. For example there is around 400 similar tests in the xlsxwriter test suite.

I'd be happier to put them in a sub directory like the test_json tests.

If your Xlsxwriter already provides those helper test functions

It doesn't, unfortunately. They are only in the repo not in the installed package.

@jmcnamara
Copy link
Contributor Author

currently the error message for no Xlsxwriter is saying that there is no xlwt

Ack. :-| Well spotted.

around line 239 I'm test_excel.py, it looks like skip_if_no_openpyxl is not being called

Yes. That is a typo.

@jreback
Copy link
Contributor

jreback commented Sep 4, 2013

@jmcnamara but your xlsxwriter is a unit test suite so of course tests more (and individual items). Here you are reading these files and then just comparing the results vs an expected values (e.g. in theory these should be quite a simple suite, just read in the file with and compare vs other 'known' engines). You could do that actually, e.g. take different frames, write them out with different writers, then read back (individually) and compare with assert_frame_equal

@jreback
Copy link
Contributor

jreback commented Sep 4, 2013

@jmcnamara follow up....

it seems that you could just iterate over a list of files and just do this all in 1 file (with separate tests for the engine kw, config and such). you would still have exactly the same tests.

(and if the test frames are different in each case, just put them in a dict or something). Most of the code in the test filess is just boilerplate to setup/cleanup etc.

@jtratner
Copy link
Contributor

jtratner commented Sep 4, 2013

building on @jreback's comment: also, it's preferable to do it that way because the outputted xlsx file should be correct because xlsxwriter outputs it correctly. If we're able to read it back and it still reads correctly in pandas, then that, fundamentally, demonstrates its correctness, right? The only areas that come to mind where it might make sense to check output is on MultiIndex and hierarchical columns, but those should also be easily round-trippable.

@jmcnamara
Copy link
Contributor Author

@jtratner @jreback

If we're able to read it back and it still reads correctly in pandas, then that, fundamentally, demonstrates its correctness, right?

Maybe. :-)

The tests are generating an Excel file and then comparing that, metadata aside, it is 100% the same as a file created in Excel. It wouldn't be possible to do that with the other engines because they don't aim for fidelity just compatibility. Which is fine.

@jmcnamara
Copy link
Contributor Author

in theory these should be quite a simple suite, just read in the file with and compare vs other 'known' engines)

The tests I added to excel_test.py do that. Or at least compare with the dataframes written out.

@jtratner
Copy link
Contributor

jtratner commented Sep 4, 2013

I'd still prefer to limit our tests to checking that "When you read in this
file with another [or multiple] readers, it produces exactly the same
DataFrame" and then leave the rest of the testing for the individual excel
writers' test suites.

@jmcnamara
Copy link
Contributor Author

I'd still prefer to limit our tests to checking that "When you read in this
file with another [or multiple] readers, it produces exactly the same
DataFrame" and then leave the rest of the testing for the individual excel
writers' test suites.

That is a fair comment.

I shouldn't be unit testing my module in your testsuite. I'll drop those tests out.

@jtratner
Copy link
Contributor

jtratner commented Sep 4, 2013

btw @jmcnamara do you mind if I submit a PR to your branch that changes around the implementation of the individual writers? I'm thinking it would make more sense to have each writer register themselves (as classes) on ExcelFile. ExcelFile becomes an abstract base class-like thing and dispatches based upon the engine passed. Then each writer just needs to implement _write_cells and they can work (but they can choose to implement save or overwrite anything else if they want).

I'm happy to do it (especially because I suggested that other way to you). Just would be helpful if you didn't rebase your branch for a while (just add additional commits and then you can squash down at the end).

@jmcnamara
Copy link
Contributor Author

@jtratner

do you mind if I submit a PR to your branch that changes around the implementation of the individual writers?

I don't mind. Refactor as much as you want.

I'll work on the enh_xlsxwriter_dev branch.

@jtratner
Copy link
Contributor

jtratner commented Sep 4, 2013

@jmcnamara okay, whatever works for you. I'll try to put something together soon...

@jmcnamara
Copy link
Contributor Author

Closing this and opening #4847 based on refactored excel.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants