ENH: Added xlsxwriter as an ExcelWriter option. #4857

jmcnamara · 2013-09-16T21:35:00Z

Added xlsxwriter as an optional writer engine. closes #4542.

jreback · 2013-09-16T21:39:39Z

doc/source/io.rst

-``xlwt`` for ``.xls`` files.  If you have multiple engines installed, you can choose the
-engine to use by default via the options ``io.excel.xlsx.writer`` and
-``io.excel.xls.writer``.
+By default ``pandas`` only supports


I think they are all optional right? and aren't you making xlsxwriter the default now?

I guess we need to edit the default settings to check if xlsxwriter or openpyxl is installed. Not sure if we could neaten this up with some importlib magic or something...

i.e.:

xlsx_set = False try: import xlsxwriter if not xlsx_set: set_option('io.excel.xlsx.writer', 'xlsxwriter') xlsx_set = True except ImportError: pass try: import openpyxl if not xlsx_set: set_option('io.excel.xlsx.writer', 'openpyxl') xlsx_set = True except ImportError: pass # etc

And we can decide on order later. I think only openpyxl supports .xlsm files. It also may be the case that xlwt supports xlsx files. If so, it would be trivial to add it here.

They are all optional but openpyxl is the default for xlsx and xlwt is the default for xls insofar as they are the default classes bound to the file extensions.

And it wasn't my intention to make xlsxwriter the default. It is probably best to see if people use it or prefer it as a default for a release or two.

Just to reiterate, I don't think it is worth changing the current behaviour of the (optional) defaults. At least not in 0.13. If it proves to be popular and robust we can consider that for 0.14.

@jmcnamara keep in mind that you have to explicitly choose to install xlsxwriter to have this work - so it's not that big of a deal. xlsxwriter isn't in the major prepackaged distributions (enthought, anaconda, python(x,y), winpython, etc), so there's a low probability for people to be surprised.

I think this is fine for now.

jtratner · 2013-09-17T03:28:37Z

@jmcnamara please tell me when you'd like me to look this over. Definitely can leave settings to a separate PR as you requested.

jmcnamara · 2013-09-20T16:32:09Z

I've lost the thread here a little bit. Are there some issue here that I need to fix?

jreback · 2013-09-20T16:37:17Z

need a release notes entry (doc/source/releast.rst), reference the issue, maybe in improvements?

possibly add a pros/cons of different writers in docs (or in docstring)?

jtratner · 2013-09-20T16:45:08Z

@jmcnamara I think I was just waiting for you to tell me that you were finished. Your call if you want to add comparisons, it's also fine to just link back to xlsxwriter docs and let people find out for themselves.

jtratner · 2013-09-20T21:56:37Z

pandas/tests/test_panel.py

+            import xlsxwriter
+            from pandas.io.excel import ExcelFile
+        except ImportError:
+            raise nose.SkipTest


I know you'd like to be done with this, but can you do me a favor and add a message like

raise nose.SkipTest("Need xlrd and xlsxwriter installed for this test.")

note to self: these tests can be moved on to the ExcelWriterBase class.

jtratner · 2013-09-20T21:59:17Z

I think this looks good - unless anyone else has objections to it, I'm going to merge tonight. Thank you @jmcnamara for bearing through this and being flexible about accommodating a refactor of the Excel writer engine while you were working on this :)

jmcnamara · 2013-09-21T07:06:29Z

Sorry for the laggy response. I'll make the requested additions today and send a message when they are good to merge. Thanks for your help and patience.

jmcnamara · 2013-09-22T20:01:45Z

@jtratner @jreback

I made the changes above and I think that this is okay to merge.

If there are any other fixes required I send some more PRs before the 0.13 release. From here I would like to do the following:

Once merged, post an update to a question about the Excel writer speed on the PyData mailing list to see if anyone is interested in trying out xlsxwriter.
Add a similar sub-class for PyExcelerate and maybe do a speed comparison.
Look at tackling the row/column refactoring.

-- John

jreback · 2013-09-22T20:05:24Z

@jmcnamara can you squash those last 2 commits?

otherwise look ok to me, @jtratner

jreback · 2013-09-22T20:06:26Z

@jmcnamara maybe want to do show some perf numbers at the top of this PR?
(at some point can add a vbench if you'd like as well)

jtratner · 2013-09-22T20:15:47Z

it would be great to come up with performance metrics. That said, memory could be an even better gain...I've found relatively small dataframes (maybe 100K cells) taking 4-6GB just to write.

jreback · 2013-09-22T20:17:39Z

this is a nice little ipython feature: %memit, like timeit, https://gist.github.com/vene/3022718

jtratner · 2013-09-22T20:17:49Z

This looks fine to me - definitely worth announcing to the PyData mailing list (and clearly it's great if you have metrics along with it). I'd be +1 on PyExcelerate too, that seems like an interesting alternative project.

jreback · 2013-09-22T20:24:50Z

is this tested/an issue with xlsxwriter? #3122

jtratner · 2013-09-22T20:28:11Z

oh good point!

jtratner · 2013-09-22T20:43:08Z

yep - xlsxwriter handles it fine. We can add those tests separately. Easy test case would be something like this like this:

def test_non_standard_dtypes(self):
      df = DataFrame(range(5))
      dtypes = ['float16', 'int16', 'int32', 'float32', 'uint64', 'uint16', 'uint32']
      with ensure_clean('_some_out.' + self.ext) as pth:
          writer = ExcelWriter(pth)
          for dtype in dtypes:
               df.astype(dtype).to_excel(writer, dtype)
          writer.save()
          reader = ExcelFile(pth)
          for dtype in dtypes:
              np.testing.assert_array_equal(reader.parse(dtype).values, df.astype(dtype).values)

Which xlsxwriter passes right now.

jtratner · 2013-09-22T20:44:54Z

However, once those two commits are squashed, I want to merge this PR...that way it's in here and easier to reason about other changes to io/excel.

Added xlsxwriter as an optional writer engine. Issue #4542.

jmcnamara · 2013-09-22T22:32:37Z

can you squash those last 2 commits?

@jreback I can do a git rebase -i HEAD~2 or HEAD~3 to squash the commits but the push will be rejected without a --force or a git pull with merge. Will I do the force or is there a better/cleaner way to squash the commits?

jtratner · 2013-09-22T22:34:20Z

@jmcnamara yes, git push --force is what you want.

jtratner · 2013-09-22T22:35:01Z

The goal is to rewrite the history to be cleaner (which is why you have to force the rewrite). Since it's is a topic branch, so it's okay to rebase like this.

jmcnamara · 2013-09-22T22:48:54Z

I'm always slightly reticent to use --force but the cleaner history is definitely better. It is done now.

ENH: Added xlsxwriter as an ExcelWriter option.

jtratner · 2013-09-22T23:19:10Z

@jmcnamara Thanks! Building on this, it would be great to look at memory and performance for xlsxwriter as well as the row-oriented vs. column-oriented discussion (like you mentioned).

Btw - I'm planning to resolve #3122 by moving the formatting on to the ExcelWriter class, so that openpyxl/xlwt can convert themselves first to deal with the dtypes comparison, so I think it should be pretty easy to handle that change in that way.

jreback reviewed Sep 16, 2013
View reviewed changes

jtratner reviewed Sep 20, 2013
View reviewed changes

ENH: Added xlsxwriter as an ExcelWriter option.

b0c290f

Added xlsxwriter as an optional writer engine. Issue #4542.

jtratner added a commit that referenced this pull request Sep 22, 2013

Merge pull request #4857 from jmcnamara/enh_xlsxwriter2

a6f814e

ENH: Added xlsxwriter as an ExcelWriter option.

jtratner merged commit a6f814e into pandas-dev:master Sep 22, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Added xlsxwriter as an ExcelWriter option. #4857

ENH: Added xlsxwriter as an ExcelWriter option. #4857

jmcnamara commented Sep 16, 2013

jreback Sep 16, 2013

jtratner Sep 16, 2013

jmcnamara Sep 16, 2013

jmcnamara Sep 17, 2013

jtratner Sep 17, 2013

jtratner Sep 20, 2013

jtratner commented Sep 17, 2013

jmcnamara commented Sep 20, 2013

jreback commented Sep 20, 2013

jtratner commented Sep 20, 2013

jtratner Sep 20, 2013

jtratner Sep 20, 2013

jtratner commented Sep 20, 2013

jmcnamara commented Sep 21, 2013

jmcnamara commented Sep 22, 2013

jreback commented Sep 22, 2013

jreback commented Sep 22, 2013

jtratner commented Sep 22, 2013

jreback commented Sep 22, 2013

jtratner commented Sep 22, 2013

jreback commented Sep 22, 2013

jtratner commented Sep 22, 2013

jtratner commented Sep 22, 2013

jtratner commented Sep 22, 2013

jmcnamara commented Sep 22, 2013

jtratner commented Sep 22, 2013

jtratner commented Sep 22, 2013

jmcnamara commented Sep 22, 2013

jtratner commented Sep 22, 2013

ENH: Added xlsxwriter as an ExcelWriter option. #4857

ENH: Added xlsxwriter as an ExcelWriter option. #4857

Conversation

jmcnamara commented Sep 16, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtratner commented Sep 17, 2013

jmcnamara commented Sep 20, 2013

jreback commented Sep 20, 2013

jtratner commented Sep 20, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtratner commented Sep 20, 2013

jmcnamara commented Sep 21, 2013

jmcnamara commented Sep 22, 2013

jreback commented Sep 22, 2013

jreback commented Sep 22, 2013

jtratner commented Sep 22, 2013

jreback commented Sep 22, 2013

jtratner commented Sep 22, 2013

jreback commented Sep 22, 2013

jtratner commented Sep 22, 2013

jtratner commented Sep 22, 2013

jtratner commented Sep 22, 2013

jmcnamara commented Sep 22, 2013

jtratner commented Sep 22, 2013

jtratner commented Sep 22, 2013

jmcnamara commented Sep 22, 2013

jtratner commented Sep 22, 2013