-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Added xlsxwriter as an ExcelWriter option. #4857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
``xlwt`` for ``.xls`` files. If you have multiple engines installed, you can choose the | ||
engine to use by default via the options ``io.excel.xlsx.writer`` and | ||
``io.excel.xls.writer``. | ||
By default ``pandas`` only supports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are all optional right? and aren't you making xlsxwriter the default now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to edit the default settings to check if xlsxwriter or openpyxl is installed. Not sure if we could neaten this up with some importlib magic or something...
i.e.:
xlsx_set = False
try:
import xlsxwriter
if not xlsx_set:
set_option('io.excel.xlsx.writer', 'xlsxwriter')
xlsx_set = True
except ImportError:
pass
try:
import openpyxl
if not xlsx_set:
set_option('io.excel.xlsx.writer', 'openpyxl')
xlsx_set = True
except ImportError:
pass
# etc
And we can decide on order later. I think only openpyxl supports .xlsm
files. It also may be the case that xlwt supports xlsx files. If so, it would be trivial to add it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are all optional but openpyxl is the default for xlsx and xlwt is the default for xls insofar as they are the default classes bound to the file extensions.
And it wasn't my intention to make xlsxwriter the default. It is probably best to see if people use it or prefer it as a default for a release or two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to reiterate, I don't think it is worth changing the current behaviour of the (optional) defaults. At least not in 0.13. If it proves to be popular and robust we can consider that for 0.14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmcnamara keep in mind that you have to explicitly choose to install xlsxwriter to have this work - so it's not that big of a deal. xlsxwriter
isn't in the major prepackaged distributions (enthought, anaconda, python(x,y), winpython, etc), so there's a low probability for people to be surprised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for now.
@jmcnamara please tell me when you'd like me to look this over. Definitely can leave settings to a separate PR as you requested. |
I've lost the thread here a little bit. Are there some issue here that I need to fix? |
need a release notes entry (doc/source/releast.rst), reference the issue, maybe in improvements? possibly add a pros/cons of different writers in docs (or in docstring)? |
@jmcnamara I think I was just waiting for you to tell me that you were finished. Your call if you want to add comparisons, it's also fine to just link back to xlsxwriter docs and let people find out for themselves. |
import xlsxwriter | ||
from pandas.io.excel import ExcelFile | ||
except ImportError: | ||
raise nose.SkipTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you'd like to be done with this, but can you do me a favor and add a message like
raise nose.SkipTest("Need xlrd and xlsxwriter installed for this test.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self: these tests can be moved on to the ExcelWriterBase class.
I think this looks good - unless anyone else has objections to it, I'm going to merge tonight. Thank you @jmcnamara for bearing through this and being flexible about accommodating a refactor of the Excel writer engine while you were working on this :) |
Sorry for the laggy response. I'll make the requested additions today and send a message when they are good to merge. Thanks for your help and patience. |
I made the changes above and I think that this is okay to merge. If there are any other fixes required I send some more PRs before the 0.13 release. From here I would like to do the following:
-- John |
@jmcnamara can you squash those last 2 commits? otherwise look ok to me, @jtratner |
@jmcnamara maybe want to do show some perf numbers at the top of this PR? |
it would be great to come up with performance metrics. That said, memory could be an even better gain...I've found relatively small dataframes (maybe 100K cells) taking 4-6GB just to write. |
this is a nice little ipython feature: |
This looks fine to me - definitely worth announcing to the PyData mailing list (and clearly it's great if you have metrics along with it). I'd be +1 on PyExcelerate too, that seems like an interesting alternative project. |
is this tested/an issue with xlsxwriter? #3122 |
oh good point! |
yep - xlsxwriter handles it fine. We can add those tests separately. Easy test case would be something like this like this: def test_non_standard_dtypes(self):
df = DataFrame(range(5))
dtypes = ['float16', 'int16', 'int32', 'float32', 'uint64', 'uint16', 'uint32']
with ensure_clean('_some_out.' + self.ext) as pth:
writer = ExcelWriter(pth)
for dtype in dtypes:
df.astype(dtype).to_excel(writer, dtype)
writer.save()
reader = ExcelFile(pth)
for dtype in dtypes:
np.testing.assert_array_equal(reader.parse(dtype).values, df.astype(dtype).values) Which xlsxwriter passes right now. |
However, once those two commits are squashed, I want to merge this PR...that way it's in here and easier to reason about other changes to io/excel. |
Added xlsxwriter as an optional writer engine. Issue #4542.
@jreback I can do a |
@jmcnamara yes, git push --force is what you want. |
The goal is to rewrite the history to be cleaner (which is why you have to force the rewrite). Since it's is a topic branch, so it's okay to rebase like this. |
I'm always slightly reticent to use |
ENH: Added xlsxwriter as an ExcelWriter option.
@jmcnamara Thanks! Building on this, it would be great to look at memory and performance for xlsxwriter as well as the row-oriented vs. column-oriented discussion (like you mentioned). Btw - I'm planning to resolve #3122 by moving the formatting on to the |
Added xlsxwriter as an optional writer engine. closes #4542.