-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Added xlsxwriter as an ExcelWriter option. #4857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
python-dateutil==2.1 | ||
pytz==2013b | ||
openpyxl==1.6.2 | ||
xlsxwriter==0.4.3 | ||
xlrd==0.9.2 | ||
numpy==1.6.2 | ||
cython==0.19.1 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
python-dateutil==2.1 | ||
pytz==2013b | ||
openpyxl==1.6.2 | ||
xlsxwriter==0.4.3 | ||
xlrd==0.9.2 | ||
html5lib==1.0b2 | ||
numpy==1.7.1 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1356,7 +1356,7 @@ def to_csv(self, path_or_buf, sep=",", na_rep='', float_format=None, | |
tupleize_cols=tupleize_cols) | ||
formatter.save() | ||
|
||
def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='', | ||
def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. btw - 👍 for fixing the default setting here to be what Excel actually uses |
||
float_format=None, cols=None, header=True, index=True, | ||
index_label=None, startrow=0, startcol=0, engine=None): | ||
""" | ||
|
@@ -1366,7 +1366,7 @@ def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='', | |
---------- | ||
excel_writer : string or ExcelWriter object | ||
File path or existing ExcelWriter | ||
sheet_name : string, default 'sheet1' | ||
sheet_name : string, default 'Sheet1' | ||
Name of sheet which will contain DataFrame | ||
na_rep : string, default '' | ||
Missing data representation | ||
|
@@ -1397,8 +1397,8 @@ def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='', | |
to the existing workbook. This can be used to save different | ||
DataFrames to one workbook | ||
>>> writer = ExcelWriter('output.xlsx') | ||
>>> df1.to_excel(writer,'sheet1') | ||
>>> df2.to_excel(writer,'sheet2') | ||
>>> df1.to_excel(writer,'Sheet1') | ||
>>> df2.to_excel(writer,'Sheet2') | ||
>>> writer.save() | ||
""" | ||
from pandas.io.excel import ExcelWriter | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -596,6 +596,7 @@ def _convert_to_style(cls, style_dict, num_format_str=None): | |
Parameters | ||
---------- | ||
style_dict: style dictionary to convert | ||
num_format_str: optional number format string | ||
""" | ||
import xlwt | ||
|
||
|
@@ -611,3 +612,95 @@ def _convert_to_style(cls, style_dict, num_format_str=None): | |
|
||
register_writer(_XlwtWriter) | ||
|
||
|
||
class _XlsxWriter(ExcelWriter): | ||
engine = 'xlsxwriter' | ||
supported_extensions = ('.xlsx',) | ||
|
||
def __init__(self, path, **engine_kwargs): | ||
# Use the xlsxwriter module as the Excel writer. | ||
import xlsxwriter | ||
|
||
super(_XlsxWriter, self).__init__(path, **engine_kwargs) | ||
|
||
self.book = xlsxwriter.Workbook(path, **engine_kwargs) | ||
|
||
def save(self): | ||
""" | ||
Save workbook to disk. | ||
""" | ||
return self.book.close() | ||
|
||
def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jmcnamara just want to bring this up again: you had mentioned that you preferred to go by columns whereas the other writers went by rows (or maybe the reverse) in order to get better performance. Are you able to do that with the current setup? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jtratner I think that I'll have to leave that for a separate PR. It is probably something that you or @jtratner could sort out more efficiently. Basically the However, I don't know if there would be an equivalent loss of performance due to using something like Also, the current xlsxwriter implementation is already x5 times faster than openpyxl and equivalent to xlwt so maybe that is enough for now. Either way it probably merits a separate discussion/PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If Xlsxwriter is more performant, then it should be first in the rotation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, @jmcnamara I'm hoping you'll be around to help me tweak things to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jtratner I'll definitely stick around and try to do some more work. :-) |
||
# Write the frame cells using xlsxwriter. | ||
|
||
sheet_name = self._get_sheet_name(sheet_name) | ||
|
||
if sheet_name in self.sheets: | ||
wks = self.sheets[sheet_name] | ||
else: | ||
wks = self.book.add_worksheet(sheet_name) | ||
self.sheets[sheet_name] = wks | ||
|
||
style_dict = {} | ||
|
||
for cell in cells: | ||
val = _conv_value(cell.val) | ||
|
||
num_format_str = None | ||
if isinstance(cell.val, datetime.datetime): | ||
num_format_str = "YYYY-MM-DD HH:MM:SS" | ||
if isinstance(cell.val, datetime.date): | ||
num_format_str = "YYYY-MM-DD" | ||
|
||
stylekey = json.dumps(cell.style) | ||
if num_format_str: | ||
stylekey += num_format_str | ||
|
||
if stylekey in style_dict: | ||
style = style_dict[stylekey] | ||
else: | ||
style = self._convert_to_style(cell.style, num_format_str) | ||
style_dict[stylekey] = style | ||
|
||
if cell.mergestart is not None and cell.mergeend is not None: | ||
wks.merge_range(startrow + cell.row, | ||
startrow + cell.mergestart, | ||
startcol + cell.col, | ||
startcol + cell.mergeend, | ||
val, style) | ||
else: | ||
wks.write(startrow + cell.row, | ||
startcol + cell.col, | ||
val, style) | ||
|
||
def _convert_to_style(self, style_dict, num_format_str=None): | ||
""" | ||
converts a style_dict to an xlsxwriter format object | ||
Parameters | ||
---------- | ||
style_dict: style dictionary to convert | ||
num_format_str: optional number format string | ||
""" | ||
if style_dict is None: | ||
return None | ||
|
||
# Create a XlsxWriter format object. | ||
xl_format = self.book.add_format() | ||
|
||
# Map the cell font to XlsxWriter font properties. | ||
if style_dict.get('font'): | ||
font = style_dict['font'] | ||
if font.get('bold'): | ||
xl_format.set_bold() | ||
|
||
# Map the cell borders to XlsxWriter border properties. | ||
if style_dict.get('borders'): | ||
xl_format.set_border() | ||
|
||
if num_format_str is not None: | ||
xl_format.set_num_format(num_format_str) | ||
|
||
return xl_format | ||
|
||
register_writer(_XlsxWriter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are all optional right? and aren't you making xlsxwriter the default now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need to edit the default settings to check if xlsxwriter or openpyxl is installed. Not sure if we could neaten this up with some importlib magic or something...
i.e.:
And we can decide on order later. I think only openpyxl supports
.xlsm
files. It also may be the case that xlwt supports xlsx files. If so, it would be trivial to add it here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are all optional but openpyxl is the default for xlsx and xlwt is the default for xls insofar as they are the default classes bound to the file extensions.
And it wasn't my intention to make xlsxwriter the default. It is probably best to see if people use it or prefer it as a default for a release or two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to reiterate, I don't think it is worth changing the current behaviour of the (optional) defaults. At least not in 0.13. If it proves to be popular and robust we can consider that for 0.14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmcnamara keep in mind that you have to explicitly choose to install xlsxwriter to have this work - so it's not that big of a deal.
xlsxwriter
isn't in the major prepackaged distributions (enthought, anaconda, python(x,y), winpython, etc), so there's a low probability for people to be surprised.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine for now.