Skip to content

Commit a6f814e

Browse files
committed
Merge pull request #4857 from jmcnamara/enh_xlsxwriter2
ENH: Added xlsxwriter as an ExcelWriter option.
2 parents ebfb4c8 + b0c290f commit a6f814e

13 files changed

+232
-22
lines changed

ci/requirements-2.7.txt

+1
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ numexpr==2.1
88
tables==2.3.1
99
matplotlib==1.1.1
1010
openpyxl==1.6.2
11+
xlsxwriter==0.4.3
1112
xlrd==0.9.2
1213
patsy==0.1.0
1314
html5lib==1.0b2

ci/requirements-2.7_LOCALE.txt

+1
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ python-dateutil
22
pytz==2013b
33
xlwt==0.7.5
44
openpyxl==1.6.2
5+
xlsxwriter==0.4.3
56
xlrd==0.9.2
67
numpy==1.6.1
78
cython==0.19.1

ci/requirements-3.2.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
python-dateutil==2.1
22
pytz==2013b
33
openpyxl==1.6.2
4+
xlsxwriter==0.4.3
45
xlrd==0.9.2
56
numpy==1.6.2
67
cython==0.19.1

ci/requirements-3.3.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
python-dateutil==2.1
22
pytz==2013b
33
openpyxl==1.6.2
4+
xlsxwriter==0.4.3
45
xlrd==0.9.2
56
html5lib==1.0b2
67
numpy==1.7.1

doc/source/10min.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -695,13 +695,13 @@ Writing to an excel file
695695

696696
.. ipython:: python
697697
698-
df.to_excel('foo.xlsx', sheet_name='sheet1')
698+
df.to_excel('foo.xlsx', sheet_name='Sheet1')
699699
700700
Reading from an excel file
701701

702702
.. ipython:: python
703703
704-
pd.read_excel('foo.xlsx', 'sheet1', index_col=None, na_values=['NA'])
704+
pd.read_excel('foo.xlsx', 'Sheet1', index_col=None, na_values=['NA'])
705705
706706
.. ipython:: python
707707
:suppress:

doc/source/install.rst

+2
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,8 @@ Optional Dependencies
106106
* `openpyxl <http://packages.python.org/openpyxl/>`__, `xlrd/xlwt <http://www.python-excel.org/>`__
107107
* openpyxl version 1.6.1 or higher
108108
* Needed for Excel I/O
109+
* `XlsxWriter <https://pypi.python.org/pypi/XlsxWriter>`__
110+
* Alternative Excel writer.
109111
* `boto <https://pypi.python.org/pypi/boto>`__: necessary for Amazon S3
110112
access.
111113
* One of `PyQt4

doc/source/io.rst

+26-8
Original file line numberDiff line numberDiff line change
@@ -1654,7 +1654,7 @@ indices to be parsed.
16541654

16551655
.. code-block:: python
16561656
1657-
read_excel('path_to_file.xls', Sheet1', parse_cols=[0, 2, 3], index_col=None, na_values=['NA'])
1657+
read_excel('path_to_file.xls', 'Sheet1', parse_cols=[0, 2, 3], index_col=None, na_values=['NA'])
16581658
16591659
To write a DataFrame object to a sheet of an Excel file, you can use the
16601660
``to_excel`` instance method. The arguments are largely the same as ``to_csv``
@@ -1664,7 +1664,7 @@ written. For example:
16641664

16651665
.. code-block:: python
16661666
1667-
df.to_excel('path_to_file.xlsx', sheet_name='sheet1')
1667+
df.to_excel('path_to_file.xlsx', sheet_name='Sheet1')
16681668
16691669
Files with a ``.xls`` extension will be written using ``xlwt`` and those with
16701670
a ``.xlsx`` extension will be written using ``openpyxl``.
@@ -1677,8 +1677,8 @@ one can use the ExcelWriter class, as in the following example:
16771677
.. code-block:: python
16781678
16791679
writer = ExcelWriter('path_to_file.xlsx')
1680-
df1.to_excel(writer, sheet_name='sheet1')
1681-
df2.to_excel(writer, sheet_name='sheet2')
1680+
df1.to_excel(writer, sheet_name='Sheet1')
1681+
df2.to_excel(writer, sheet_name='Sheet2')
16821682
writer.save()
16831683
16841684
.. _io.excel.writers:
@@ -1693,11 +1693,29 @@ Excel writer engines
16931693
1. the ``engine`` keyword argument
16941694
2. the filename extension (via the default specified in config options)
16951695

1696-
``pandas`` only supports ``openpyxl`` for ``.xlsx`` and ``.xlsm`` files and
1697-
``xlwt`` for ``.xls`` files. If you have multiple engines installed, you can choose the
1698-
engine to use by default via the options ``io.excel.xlsx.writer`` and
1699-
``io.excel.xls.writer``.
1696+
By default ``pandas`` only supports
1697+
`openpyxl <http://packages.python.org/openpyxl/>`__ as a writer for ``.xlsx``
1698+
and ``.xlsm`` files and `xlwt <http://www.python-excel.org/>`__ as a writer for
1699+
``.xls`` files. If you have multiple engines installed, you can change the
1700+
default engine via the ``io.excel.xlsx.writer`` and ``io.excel.xls.writer``
1701+
options.
17001702

1703+
For example if the optional `XlsxWriter <http://xlsxwriter.readthedocs.org>`__
1704+
module is installed you can use it as a xlsx writer engine as follows:
1705+
1706+
.. code-block:: python
1707+
1708+
# By setting the 'engine' in the DataFrame and Panel 'to_excel()' methods.
1709+
df.to_excel('path_to_file.xlsx', sheet_name='Sheet1', engine='xlsxwriter')
1710+
1711+
# By setting the 'engine' in the ExcelWriter constructor.
1712+
writer = ExcelWriter('path_to_file.xlsx', engine='xlsxwriter')
1713+
1714+
# Or via pandas configuration.
1715+
from pandas import set_option
1716+
set_option('io.excel.xlsx.writer', 'xlsxwriter')
1717+
1718+
df.to_excel('path_to_file.xlsx', sheet_name='Sheet1')
17011719
17021720
.. _io.hdf5:
17031721

doc/source/release.rst

+3
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,9 @@ Improvements to existing features
113113
``io.excel.xls.writer``. (:issue:`4745`, :issue:`4750`)
114114
- ``Panel.to_excel()`` now accepts keyword arguments that will be passed to
115115
its ``DataFrame``'s ``to_excel()`` methods. (:issue:`4750`)
116+
- Added XlsxWriter as an optional ``ExcelWriter`` engine. This is about 5x
117+
faster than the default openpyxl xlsx writer and is equivalent in speed
118+
to the xlwt xls writer module. (:issue:`4542`)
116119
- allow DataFrame constructor to accept more list-like objects, e.g. list of
117120
``collections.Sequence`` and ``array.Array`` objects (:issue:`3783`,:issue:`4297`, :issue:`4851`),
118121
thanks @lgautier

pandas/core/frame.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -1356,7 +1356,7 @@ def to_csv(self, path_or_buf, sep=",", na_rep='', float_format=None,
13561356
tupleize_cols=tupleize_cols)
13571357
formatter.save()
13581358

1359-
def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='',
1359+
def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
13601360
float_format=None, cols=None, header=True, index=True,
13611361
index_label=None, startrow=0, startcol=0, engine=None):
13621362
"""
@@ -1366,7 +1366,7 @@ def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='',
13661366
----------
13671367
excel_writer : string or ExcelWriter object
13681368
File path or existing ExcelWriter
1369-
sheet_name : string, default 'sheet1'
1369+
sheet_name : string, default 'Sheet1'
13701370
Name of sheet which will contain DataFrame
13711371
na_rep : string, default ''
13721372
Missing data representation
@@ -1397,8 +1397,8 @@ def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='',
13971397
to the existing workbook. This can be used to save different
13981398
DataFrames to one workbook
13991399
>>> writer = ExcelWriter('output.xlsx')
1400-
>>> df1.to_excel(writer,'sheet1')
1401-
>>> df2.to_excel(writer,'sheet2')
1400+
>>> df1.to_excel(writer,'Sheet1')
1401+
>>> df2.to_excel(writer,'Sheet2')
14021402
>>> writer.save()
14031403
"""
14041404
from pandas.io.excel import ExcelWriter

pandas/io/excel.py

+93
Original file line numberDiff line numberDiff line change
@@ -596,6 +596,7 @@ def _convert_to_style(cls, style_dict, num_format_str=None):
596596
Parameters
597597
----------
598598
style_dict: style dictionary to convert
599+
num_format_str: optional number format string
599600
"""
600601
import xlwt
601602

@@ -611,3 +612,95 @@ def _convert_to_style(cls, style_dict, num_format_str=None):
611612

612613
register_writer(_XlwtWriter)
613614

615+
616+
class _XlsxWriter(ExcelWriter):
617+
engine = 'xlsxwriter'
618+
supported_extensions = ('.xlsx',)
619+
620+
def __init__(self, path, **engine_kwargs):
621+
# Use the xlsxwriter module as the Excel writer.
622+
import xlsxwriter
623+
624+
super(_XlsxWriter, self).__init__(path, **engine_kwargs)
625+
626+
self.book = xlsxwriter.Workbook(path, **engine_kwargs)
627+
628+
def save(self):
629+
"""
630+
Save workbook to disk.
631+
"""
632+
return self.book.close()
633+
634+
def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0):
635+
# Write the frame cells using xlsxwriter.
636+
637+
sheet_name = self._get_sheet_name(sheet_name)
638+
639+
if sheet_name in self.sheets:
640+
wks = self.sheets[sheet_name]
641+
else:
642+
wks = self.book.add_worksheet(sheet_name)
643+
self.sheets[sheet_name] = wks
644+
645+
style_dict = {}
646+
647+
for cell in cells:
648+
val = _conv_value(cell.val)
649+
650+
num_format_str = None
651+
if isinstance(cell.val, datetime.datetime):
652+
num_format_str = "YYYY-MM-DD HH:MM:SS"
653+
if isinstance(cell.val, datetime.date):
654+
num_format_str = "YYYY-MM-DD"
655+
656+
stylekey = json.dumps(cell.style)
657+
if num_format_str:
658+
stylekey += num_format_str
659+
660+
if stylekey in style_dict:
661+
style = style_dict[stylekey]
662+
else:
663+
style = self._convert_to_style(cell.style, num_format_str)
664+
style_dict[stylekey] = style
665+
666+
if cell.mergestart is not None and cell.mergeend is not None:
667+
wks.merge_range(startrow + cell.row,
668+
startrow + cell.mergestart,
669+
startcol + cell.col,
670+
startcol + cell.mergeend,
671+
val, style)
672+
else:
673+
wks.write(startrow + cell.row,
674+
startcol + cell.col,
675+
val, style)
676+
677+
def _convert_to_style(self, style_dict, num_format_str=None):
678+
"""
679+
converts a style_dict to an xlsxwriter format object
680+
Parameters
681+
----------
682+
style_dict: style dictionary to convert
683+
num_format_str: optional number format string
684+
"""
685+
if style_dict is None:
686+
return None
687+
688+
# Create a XlsxWriter format object.
689+
xl_format = self.book.add_format()
690+
691+
# Map the cell font to XlsxWriter font properties.
692+
if style_dict.get('font'):
693+
font = style_dict['font']
694+
if font.get('bold'):
695+
xl_format.set_bold()
696+
697+
# Map the cell borders to XlsxWriter border properties.
698+
if style_dict.get('borders'):
699+
xl_format.set_border()
700+
701+
if num_format_str is not None:
702+
xl_format.set_num_format(num_format_str)
703+
704+
return xl_format
705+
706+
register_writer(_XlsxWriter)

0 commit comments

Comments
 (0)