Skip to content

ENH: Added xlsxwriter as an ExcelWriter option. #4857

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 22, 2013
Merged

ENH: Added xlsxwriter as an ExcelWriter option. #4857

merged 1 commit into from
Sep 22, 2013

Conversation

jmcnamara
Copy link
Contributor

Added xlsxwriter as an optional writer engine. closes #4542.

``xlwt`` for ``.xls`` files. If you have multiple engines installed, you can choose the
engine to use by default via the options ``io.excel.xlsx.writer`` and
``io.excel.xls.writer``.
By default ``pandas`` only supports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are all optional right? and aren't you making xlsxwriter the default now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to edit the default settings to check if xlsxwriter or openpyxl is installed. Not sure if we could neaten this up with some importlib magic or something...

i.e.:

xlsx_set = False
try:
    import xlsxwriter
    if not xlsx_set:
        set_option('io.excel.xlsx.writer', 'xlsxwriter')
        xlsx_set = True
except ImportError:
    pass
try:
    import openpyxl
    if not xlsx_set:
        set_option('io.excel.xlsx.writer', 'openpyxl')
        xlsx_set = True
except ImportError:
    pass
    # etc

And we can decide on order later. I think only openpyxl supports .xlsm files. It also may be the case that xlwt supports xlsx files. If so, it would be trivial to add it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are all optional but openpyxl is the default for xlsx and xlwt is the default for xls insofar as they are the default classes bound to the file extensions.

And it wasn't my intention to make xlsxwriter the default. It is probably best to see if people use it or prefer it as a default for a release or two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to reiterate, I don't think it is worth changing the current behaviour of the (optional) defaults. At least not in 0.13. If it proves to be popular and robust we can consider that for 0.14.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmcnamara keep in mind that you have to explicitly choose to install xlsxwriter to have this work - so it's not that big of a deal. xlsxwriter isn't in the major prepackaged distributions (enthought, anaconda, python(x,y), winpython, etc), so there's a low probability for people to be surprised.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now.

@jtratner
Copy link
Contributor

@jmcnamara please tell me when you'd like me to look this over. Definitely can leave settings to a separate PR as you requested.

@jmcnamara
Copy link
Contributor Author

I've lost the thread here a little bit. Are there some issue here that I need to fix?

@jreback
Copy link
Contributor

jreback commented Sep 20, 2013

need a release notes entry (doc/source/releast.rst), reference the issue, maybe in improvements?

possibly add a pros/cons of different writers in docs (or in docstring)?

@jtratner
Copy link
Contributor

@jmcnamara I think I was just waiting for you to tell me that you were finished. Your call if you want to add comparisons, it's also fine to just link back to xlsxwriter docs and let people find out for themselves.

import xlsxwriter
from pandas.io.excel import ExcelFile
except ImportError:
raise nose.SkipTest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you'd like to be done with this, but can you do me a favor and add a message like

raise nose.SkipTest("Need xlrd and xlsxwriter installed for this test.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: these tests can be moved on to the ExcelWriterBase class.

@jtratner
Copy link
Contributor

I think this looks good - unless anyone else has objections to it, I'm going to merge tonight. Thank you @jmcnamara for bearing through this and being flexible about accommodating a refactor of the Excel writer engine while you were working on this :)

@jmcnamara
Copy link
Contributor Author

Sorry for the laggy response. I'll make the requested additions today and send a message when they are good to merge. Thanks for your help and patience.

@jmcnamara
Copy link
Contributor Author

@jtratner @jreback

I made the changes above and I think that this is okay to merge.

If there are any other fixes required I send some more PRs before the 0.13 release. From here I would like to do the following:

  • Once merged, post an update to a question about the Excel writer speed on the PyData mailing list to see if anyone is interested in trying out xlsxwriter.
  • Add a similar sub-class for PyExcelerate and maybe do a speed comparison.
  • Look at tackling the row/column refactoring.

-- John

@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

@jmcnamara can you squash those last 2 commits?

otherwise look ok to me, @jtratner

@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

@jmcnamara maybe want to do show some perf numbers at the top of this PR?
(at some point can add a vbench if you'd like as well)

@jtratner
Copy link
Contributor

it would be great to come up with performance metrics. That said, memory could be an even better gain...I've found relatively small dataframes (maybe 100K cells) taking 4-6GB just to write.

@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

this is a nice little ipython feature: %memit, like timeit, https://gist.github.com/vene/3022718

@jtratner
Copy link
Contributor

This looks fine to me - definitely worth announcing to the PyData mailing list (and clearly it's great if you have metrics along with it). I'd be +1 on PyExcelerate too, that seems like an interesting alternative project.

@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

is this tested/an issue with xlsxwriter? #3122

@jtratner
Copy link
Contributor

oh good point!

@jtratner
Copy link
Contributor

yep - xlsxwriter handles it fine. We can add those tests separately. Easy test case would be something like this like this:

def test_non_standard_dtypes(self):
      df = DataFrame(range(5))
      dtypes = ['float16', 'int16', 'int32', 'float32', 'uint64', 'uint16', 'uint32']
      with ensure_clean('_some_out.' + self.ext) as pth:
          writer = ExcelWriter(pth)
          for dtype in dtypes:
               df.astype(dtype).to_excel(writer, dtype)
          writer.save()
          reader = ExcelFile(pth)
          for dtype in dtypes:
              np.testing.assert_array_equal(reader.parse(dtype).values, df.astype(dtype).values)

Which xlsxwriter passes right now.

@jtratner
Copy link
Contributor

However, once those two commits are squashed, I want to merge this PR...that way it's in here and easier to reason about other changes to io/excel.

Added xlsxwriter as an optional writer engine. Issue #4542.
@jmcnamara
Copy link
Contributor Author

can you squash those last 2 commits?

@jreback I can do a git rebase -i HEAD~2 or HEAD~3 to squash the commits but the push will be rejected without a --force or a git pull with merge. Will I do the force or is there a better/cleaner way to squash the commits?

@jtratner
Copy link
Contributor

@jmcnamara yes, git push --force is what you want.

@jtratner
Copy link
Contributor

The goal is to rewrite the history to be cleaner (which is why you have to force the rewrite). Since it's is a topic branch, so it's okay to rebase like this.

@jmcnamara
Copy link
Contributor Author

I'm always slightly reticent to use --force but the cleaner history is definitely better. It is done now.

jtratner added a commit that referenced this pull request Sep 22, 2013
ENH: Added xlsxwriter as an ExcelWriter option.
@jtratner jtratner merged commit a6f814e into pandas-dev:master Sep 22, 2013
@jtratner
Copy link
Contributor

@jmcnamara Thanks! Building on this, it would be great to look at memory and performance for xlsxwriter as well as the row-oriented vs. column-oriented discussion (like you mentioned).

Btw - I'm planning to resolve #3122 by moving the formatting on to the ExcelWriter class, so that openpyxl/xlwt can convert themselves first to deal with the dtypes comparison, so I think it should be pretty easy to handle that change in that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Adding XlsxWriter as an ExcelWriter() option
3 participants