Skip to content

Support more styles for xlsxwriter #16149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Oct 31, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/style.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -918,7 +918,7 @@
"\n",
"<span style=\"color: red\">*Experimental: This is a new feature and still under development. We'll be adding features and possibly making breaking changes in future releases. We'd love to hear your feedback.*</span>\n",
"\n",
"Some support is available for exporting styled `DataFrames` to Excel worksheets using the `OpenPyXL` engine. CSS2.2 properties handled include:\n",
"Some support is available for exporting styled `DataFrames` to Excel worksheets using the `OpenPyXL` or `XlsxWriter` engines. CSS2.2 properties handled include:\n",
"\n",
"- `background-color`\n",
"- `border-style`, `border-width`, `border-color` and their {`top`, `right`, `bottom`, `left` variants}\n",
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -368,7 +368,7 @@ To convert a ``SparseDataFrame`` back to sparse SciPy matrix in COO format, you
Excel output for styled DataFrames
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Experimental support has been added to export ``DataFrame.style`` formats to Excel using the ``openpyxl`` engine. (:issue:`15530`)
Experimental support has been added to export ``DataFrame.style`` formats to Excel using the ``openpyxl`` or ``xlsxwriter`` engines. (:issue:`15530`, :issue:`16149`)

For example, after running the following, ``styled.xlsx`` renders as below:

Expand Down
134 changes: 109 additions & 25 deletions pandas/io/excel.py
Original file line number Diff line number Diff line change
Expand Up @@ -1596,6 +1596,68 @@ def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0,
startcol + cell.col,
val, style)

# Map from openpyxl-oriented styles to flatter xlsxwriter representation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code would be simpler to make this style formatting into a separate class (rather than have it live in functions sitting in the main excel code). can you refactor to make this cleaner.

STYLE_MAPPING = [
(('font', 'name'), 'font_name'),
(('font', 'sz'), 'font_size'),
(('font', 'size'), 'font_size'),
(('font', 'color', 'rgb'), 'font_color'),
(('font', 'color'), 'font_color'),
(('font', 'b'), 'bold'),
(('font', 'bold'), 'bold'),
(('font', 'i'), 'italic'),
(('font', 'italic'), 'italic'),
(('font', 'u'), 'underline'),
(('font', 'underline'), 'underline'),
(('font', 'strike'), 'font_strikeout'),
(('font', 'vertAlign'), 'font_script'),
(('font', 'vertalign'), 'font_script'),
(('number_format', 'format_code'), 'num_format'),
(('number_format',), 'num_format'),
(('protection', 'locked'), 'locked'),
(('protection', 'hidden'), 'hidden'),
(('alignment', 'horizontal'), 'align'),
(('alignment', 'vertical'), 'valign'),
(('alignment', 'text_rotation'), 'rotation'),
(('alignment', 'wrap_text'), 'text_wrap'),
(('alignment', 'indent'), 'indent'),
(('alignment', 'shrink_to_fit'), 'shrink'),
(('fill', 'patternType'), 'pattern'),
(('fill', 'patterntype'), 'pattern'),
(('fill', 'fill_type'), 'pattern'),
(('fill', 'start_color', 'rgb'), 'fg_color'),
(('fill', 'fgColor', 'rgb'), 'fg_color'),
(('fill', 'fgcolor', 'rgb'), 'fg_color'),
(('fill', 'start_color'), 'fg_color'),
(('fill', 'fgColor'), 'fg_color'),
(('fill', 'fgcolor'), 'fg_color'),
(('fill', 'end_color', 'rgb'), 'bg_color'),
(('fill', 'bgColor', 'rgb'), 'bg_color'),
(('fill', 'bgcolor', 'rgb'), 'bg_color'),
(('fill', 'end_color'), 'bg_color'),
(('fill', 'bgColor'), 'bg_color'),
(('fill', 'bgcolor'), 'bg_color'),
(('border', 'color', 'rgb'), 'border_color'),
(('border', 'color'), 'border_color'),
(('border', 'style'), 'border'),
(('border', 'top', 'color', 'rgb'), 'top_color'),
(('border', 'top', 'color'), 'top_color'),
(('border', 'top', 'style'), 'top'),
(('border', 'top'), 'top'),
(('border', 'right', 'color', 'rgb'), 'right_color'),
(('border', 'right', 'color'), 'right_color'),
(('border', 'right', 'style'), 'right'),
(('border', 'right'), 'right'),
(('border', 'bottom', 'color', 'rgb'), 'bottom_color'),
(('border', 'bottom', 'color'), 'bottom_color'),
(('border', 'bottom', 'style'), 'bottom'),
(('border', 'bottom'), 'bottom'),
(('border', 'left', 'color', 'rgb'), 'left_color'),
(('border', 'left', 'color'), 'left_color'),
(('border', 'left', 'style'), 'left'),
(('border', 'left'), 'left'),
]

def _convert_to_style(self, style_dict, num_format_str=None):
"""
converts a style_dict to an xlsxwriter format object
Expand All @@ -1610,35 +1672,57 @@ def _convert_to_style(self, style_dict, num_format_str=None):
return None

# Create a XlsxWriter format object.
xl_format = self.book.add_format()
props = {}

if num_format_str is not None:
xl_format.set_num_format(num_format_str)
props['num_format'] = num_format_str

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move more of this logic out of here an into the formats dir somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean explicitly moving the number format logic out? Yes, perhaps that's a worthwhile refactoring. I think we should also be calculating the number format from the display.precision config. For which reason, I believe all of those changes belong in a different PR.

Or are you talking about moving this mapping logic out? Well currently we assume nested style dicts as an interchange format, which are well-suited to openpyxl but need conversion for all writers. The stuff in formats/ should remain relatively writer-agnostic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally all of this logic would just be a single call here and the logic elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've grokked your vision, given that this is writer specific. Do you mean that there should be more refactoring across writers? Except for this number formatting, it's already quite factored, as they each have different syntaxes for creating and formatting cells.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i think think excel should be refactored into a subdir of writer code and style things should live there

maybe make an issue about this
it's a bit of work to split it then adding things like style should be easy

if style_dict is None:
return xl_format

# Map the cell font to XlsxWriter font properties.
if style_dict.get('font'):
font = style_dict['font']
if font.get('bold'):
xl_format.set_bold()

# Map the alignment to XlsxWriter alignment properties.
alignment = style_dict.get('alignment')
if alignment:
if (alignment.get('horizontal') and
alignment['horizontal'] == 'center'):
xl_format.set_align('center')
if (alignment.get('vertical') and
alignment['vertical'] == 'top'):
xl_format.set_align('top')

# Map the cell borders to XlsxWriter border properties.
if style_dict.get('borders'):
xl_format.set_border()

return xl_format
return self.book.add_format(props)

if 'borders' in style_dict:
style_dict = style_dict.copy()
style_dict['border'] = style_dict.pop('borders')

for src, dst in self.STYLE_MAPPING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this only is triggered if there is styling (IOW this won't cause a perf issue for 'regular' excel)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines above we return if style_dict is None; a few lines above that we return if num_format_str is None and style_dict is None. I think that is sufficient.

Btw, I think even the default to_excel has some styling of headers, so this function will always be called, but will be returned early where possible.

There are ways to make this faster, though:

  • store STYLE_MAPPING as a trie and descend recursively only where a prefix is matched.
  • flatten style_dict and store STYLE_MAPPING as a dict so that their keys match. But to be deterministic in case of multiple competing styles, STYLE_MAPPING would need to store the matched index, and the results would need to be sorted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pushing a faster variant.

# src is a sequence of keys into a nested dict
# dst is a flat key
if dst in props:
continue
v = style_dict
for k in src:
try:
v = v[k]
except (KeyError, TypeError):
break
else:
props[dst] = v

if isinstance(props.get('pattern'), string_types):
# TODO: support other fill patterns
props['pattern'] = 0 if props['pattern'] == 'none' else 1

for k in ['border', 'top', 'right', 'bottom', 'left']:
if isinstance(props.get(k), string_types):
try:
props[k] = ['none', 'thin', 'medium', 'dashed', 'dotted',
'thick', 'double', 'hair', 'mediumDashed',
'dashDot', 'mediumDashDot', 'dashDotDot',
'mediumDashDotDot', 'slantDashDot'].\
index(props[k])
except ValueError:
props[k] = 2

if isinstance(props.get('font_script'), string_types):
props['font_script'] = ['baseline', 'superscript', 'subscript'].\
index(props['font_script'])

if isinstance(props.get('underline'), string_types):
props['underline'] = {'none': 0, 'single': 1, 'double': 2,
'singleAccounting': 33,
'doubleAccounting': 34}[props['underline']]

return self.book.add_format(props)


register_writer(_XlsxWriter)
173 changes: 94 additions & 79 deletions pandas/tests/io/test_excel.py
Original file line number Diff line number Diff line change
Expand Up @@ -2408,85 +2408,100 @@ def custom_converter(css):
styled.to_excel(writer, sheet_name='styled')
ExcelFormatter(styled, style_converter=custom_converter).write(
writer, sheet_name='custom')
writer.save()

# For engines other than openpyxl 2, we only smoke test
if engine != 'openpyxl':
return
if not openpyxl_compat.is_compat(major_ver=2):
pytest.skip('incompatible openpyxl version')

# (1) compare DataFrame.to_excel and Styler.to_excel when unstyled
n_cells = 0
for col1, col2 in zip(writer.sheets['frame'].columns,
writer.sheets['unstyled'].columns):
assert len(col1) == len(col2)
for cell1, cell2 in zip(col1, col2):
assert cell1.value == cell2.value
assert_equal_style(cell1, cell2)
n_cells += 1

# ensure iteration actually happened:
assert n_cells == (10 + 1) * (3 + 1)

# (2) check styling with default converter
n_cells = 0
for col1, col2 in zip(writer.sheets['frame'].columns,
writer.sheets['styled'].columns):
assert len(col1) == len(col2)
for cell1, cell2 in zip(col1, col2):
ref = '%s%d' % (cell2.column, cell2.row)
# XXX: this isn't as strong a test as ideal; we should
# differences are exclusive
if ref == 'B2':
assert not cell1.font.bold
assert cell2.font.bold
elif ref == 'C3':
assert cell1.font.color.rgb != cell2.font.color.rgb
assert cell2.font.color.rgb == '000000FF'
elif ref == 'D4':
assert cell1.font.underline != cell2.font.underline
assert cell2.font.underline == 'single'
elif ref == 'B5':
assert not cell1.border.left.style
assert (cell2.border.top.style ==
cell2.border.right.style ==
cell2.border.bottom.style ==
cell2.border.left.style ==
'medium')
elif ref == 'C6':
assert not cell1.font.italic
assert cell2.font.italic
elif ref == 'D7':
assert (cell1.alignment.horizontal !=
cell2.alignment.horizontal)
assert cell2.alignment.horizontal == 'right'
elif ref == 'B8':
assert cell1.fill.fgColor.rgb != cell2.fill.fgColor.rgb
assert cell1.fill.patternType != cell2.fill.patternType
assert cell2.fill.fgColor.rgb == '00FF0000'
assert cell2.fill.patternType == 'solid'
else:
assert_equal_style(cell1, cell2)

assert cell1.value == cell2.value
n_cells += 1

assert n_cells == (10 + 1) * (3 + 1)

# (3) check styling with custom converter
n_cells = 0
for col1, col2 in zip(writer.sheets['frame'].columns,
writer.sheets['custom'].columns):
assert len(col1) == len(col2)
for cell1, cell2 in zip(col1, col2):
ref = '%s%d' % (cell2.column, cell2.row)
if ref in ('B2', 'C3', 'D4', 'B5', 'C6', 'D7', 'B8'):
assert not cell1.font.bold
assert cell2.font.bold
else:
assert_equal_style(cell1, cell2)
if engine not in ('openpyxl', 'xlsxwriter'):
# For other engines, we only smoke test
return
openpyxl = pytest.importorskip('openpyxl')
if not openpyxl_compat.is_compat(major_ver=2):
pytest.skip('incompatible openpyxl version')

assert cell1.value == cell2.value
n_cells += 1
wb = openpyxl.load_workbook(path)

assert n_cells == (10 + 1) * (3 + 1)
# (1) compare DataFrame.to_excel and Styler.to_excel when unstyled
n_cells = 0
for col1, col2 in zip(wb['frame'].columns,
wb['unstyled'].columns):
assert len(col1) == len(col2)
for cell1, cell2 in zip(col1, col2):
assert cell1.value == cell2.value
assert_equal_style(cell1, cell2)
n_cells += 1

# ensure iteration actually happened:
assert n_cells == (10 + 1) * (3 + 1)

# (2) check styling with default converter

# XXX: openpyxl (as at 2.4) prefixes colors with 00, xlsxwriter with FF
alpha = '00' if engine == 'openpyxl' else 'FF'

n_cells = 0
for col1, col2 in zip(wb['frame'].columns,
wb['styled'].columns):
assert len(col1) == len(col2)
for cell1, cell2 in zip(col1, col2):
ref = '%s%d' % (cell2.column, cell2.row)
# XXX: this isn't as strong a test as ideal; we should
# confirm that differences are exclusive
if ref == 'B2':
assert not cell1.font.bold
assert cell2.font.bold
elif ref == 'C3':
assert cell1.font.color.rgb != cell2.font.color.rgb
assert cell2.font.color.rgb == alpha + '0000FF'
elif ref == 'D4':
# This fails with engine=xlsxwriter due to
# https://bitbucket.org/openpyxl/openpyxl/issues/800
if engine == 'xlsxwriter' \
and (LooseVersion(openpyxl.__version__) <
LooseVersion('2.4.6')):
pass
else:
assert cell1.font.underline != cell2.font.underline
assert cell2.font.underline == 'single'
elif ref == 'B5':
assert not cell1.border.left.style
assert (cell2.border.top.style ==
cell2.border.right.style ==
cell2.border.bottom.style ==
cell2.border.left.style ==
'medium')
elif ref == 'C6':
assert not cell1.font.italic
assert cell2.font.italic
elif ref == 'D7':
assert (cell1.alignment.horizontal !=
cell2.alignment.horizontal)
assert cell2.alignment.horizontal == 'right'
elif ref == 'B8':
assert cell1.fill.fgColor.rgb != cell2.fill.fgColor.rgb
assert cell1.fill.patternType != cell2.fill.patternType
assert cell2.fill.fgColor.rgb == alpha + 'FF0000'
assert cell2.fill.patternType == 'solid'
else:
assert_equal_style(cell1, cell2)

assert cell1.value == cell2.value
n_cells += 1

assert n_cells == (10 + 1) * (3 + 1)

# (3) check styling with custom converter
n_cells = 0
for col1, col2 in zip(wb['frame'].columns,
wb['custom'].columns):
assert len(col1) == len(col2)
for cell1, cell2 in zip(col1, col2):
ref = '%s%d' % (cell2.column, cell2.row)
if ref in ('B2', 'C3', 'D4', 'B5', 'C6', 'D7', 'B8'):
assert not cell1.font.bold
assert cell2.font.bold
else:
assert_equal_style(cell1, cell2)

assert cell1.value == cell2.value
n_cells += 1

assert n_cells == (10 + 1) * (3 + 1)