Skip to content

Support more styles for xlsxwriter #16149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Oct 31, 2017
Merged

Conversation

jnothman
Copy link
Contributor

@jnothman jnothman commented Apr 26, 2017

I was surprised to find that despite the interchangeable representation of Excel styles, xlsxwriter did not have good style support.

I've not added direct tests for this functionality, but test some of it through test_styler_to_excel.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
  • whatsnew entry

@jnothman jnothman force-pushed the xlsxwriter-styles branch 2 times, most recently from a3b2a87 to 3f687c4 Compare April 26, 2017 16:47
@jnothman
Copy link
Contributor Author

There is a consistent test failure, but not one I've managed to replicate locally.

@codecov
Copy link

codecov bot commented Apr 27, 2017

Codecov Report

Merging #16149 into master will decrease coverage by <.01%.
The diff coverage is 89.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16149      +/-   ##
==========================================
- Coverage   90.83%   90.83%   -0.01%     
==========================================
  Files         159      159              
  Lines       50796    50809      +13     
==========================================
+ Hits        46143    46153      +10     
- Misses       4653     4656       +3
Flag Coverage Δ
#multiple 88.61% <89.65%> (-0.01%) ⬇️
#single 40.3% <3.44%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/io/excel.py 80.55% <89.65%> (-0.07%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3b80ed3...c935f5d. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 27, 2017

Codecov Report

Merging #16149 into master will decrease coverage by 0.02%.
The diff coverage is 92.1%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16149      +/-   ##
==========================================
- Coverage   91.24%   91.21%   -0.03%     
==========================================
  Files         163      163              
  Lines       50091    50106      +15     
==========================================
+ Hits        45704    45706       +2     
- Misses       4387     4400      +13
Flag Coverage Δ
#multiple 89.02% <92.1%> (-0.01%) ⬇️
#single 40.23% <10.52%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/excel.py 80.39% <92.1%> (-0.01%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.75% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.41% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5959ee3...80ed56a. Read the comment docs.

@jnothman
Copy link
Contributor Author

Finally passing tests after identifying a recently fixed openpyxl bug that got in the way

@jnothman jnothman force-pushed the xlsxwriter-styles branch from fd4050e to 96ab259 Compare April 27, 2017 07:20
@jreback jreback added the IO Excel read_excel, to_excel label Apr 27, 2017

if num_format_str is not None:
xl_format.set_num_format(num_format_str)
props['num_format'] = num_format_str

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move more of this logic out of here an into the formats dir somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean explicitly moving the number format logic out? Yes, perhaps that's a worthwhile refactoring. I think we should also be calculating the number format from the display.precision config. For which reason, I believe all of those changes belong in a different PR.

Or are you talking about moving this mapping logic out? Well currently we assume nested style dicts as an interchange format, which are well-suited to openpyxl but need conversion for all writers. The stuff in formats/ should remain relatively writer-agnostic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally all of this logic would just be a single call here and the logic elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've grokked your vision, given that this is writer specific. Do you mean that there should be more refactoring across writers? Except for this number formatting, it's already quite factored, as they each have different syntaxes for creating and formatting cells.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i think think excel should be refactored into a subdir of writer code and style things should live there

maybe make an issue about this
it's a bit of work to split it then adding things like style should be easy

@jnothman
Copy link
Contributor Author

I'm happy to make an issue aiming to refactor excel writing code. But how do you feel about this PR?

@jreback
Copy link
Contributor

jreback commented Jun 10, 2017

can you rebase.

@jnothman
Copy link
Contributor Author

I've moved the what's new to 0.20.

style_dict = style_dict.copy()
style_dict['border'] = style_dict.pop('borders')

for src, dst in self.STYLE_MAPPING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this only is triggered if there is styling (IOW this won't cause a perf issue for 'regular' excel)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines above we return if style_dict is None; a few lines above that we return if num_format_str is None and style_dict is None. I think that is sufficient.

Btw, I think even the default to_excel has some styling of headers, so this function will always be called, but will be returned early where possible.

There are ways to make this faster, though:

  • store STYLE_MAPPING as a trie and descend recursively only where a prefix is matched.
  • flatten style_dict and store STYLE_MAPPING as a dict so that their keys match. But to be deterministic in case of multiple competing styles, STYLE_MAPPING would need to store the matched index, and the results would need to be sorted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pushing a faster variant.

@@ -1609,6 +1609,68 @@ def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0,
startcol + cell.col,
val, style)

# Map from openpyxl-oriented styles to flatter xlsxwriter representation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code would be simpler to make this style formatting into a separate class (rather than have it live in functions sitting in the main excel code). can you refactor to make this cleaner.

@jreback
Copy link
Contributor

jreback commented Aug 17, 2017

this looks reasonable, can you rebase

@jnothman
Copy link
Contributor Author

Yes, sorry I've not managed to do the refactoring you'd like to see. I've been unsure what you would like, and have had my attentions elsewhere.

Copy link
Contributor Author

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppVeyor failure looks like someone else's problem.

style_dict = style_dict.copy()
style_dict['border'] = style_dict.pop('borders')

for src, dst in self.STYLE_MAPPING:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines above we return if style_dict is None; a few lines above that we return if num_format_str is None and style_dict is None. I think that is sufficient.

Btw, I think even the default to_excel has some styling of headers, so this function will always be called, but will be returned early where possible.

There are ways to make this faster, though:

  • store STYLE_MAPPING as a trie and descend recursively only where a prefix is matched.
  • flatten style_dict and store STYLE_MAPPING as a dict so that their keys match. But to be deterministic in case of multiple competing styles, STYLE_MAPPING would need to store the matched index, and the results would need to be sorted.

style_dict = style_dict.copy()
style_dict['border'] = style_dict.pop('borders')

for src, dst in self.STYLE_MAPPING:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pushing a faster variant.

@jreback
Copy link
Contributor

jreback commented Aug 18, 2017

cc @chris-b1 @TomAugspurger

@jnothman
Copy link
Contributor Author

Merge into 0.21 and avoid future what's new conflicts?

@jreback
Copy link
Contributor

jreback commented Oct 16, 2017

you need to rebase and some comments to respond

@jnothman
Copy link
Contributor Author

Well, your comments suggested a refactor, and I've previously commented that I do not understand your vision of a refactor in this space, and certainly have not found capacity within my other FOSS commitments to to do a general code quality improvement here.

Since those comments, you announced "this looks reasonable". I'll have another very brief look at refactoring.

@jnothman
Copy link
Contributor Author

I made the requested change as I interpret it. I do not find it improves code quality.

@jnothman
Copy link
Contributor Author

I haven't understood the cause of the travis failure, apparently in lint.sh but without error message that I can see.

I've merged in master and moved what's new to the next version.

@jreback
Copy link
Contributor

jreback commented Oct 29, 2017

Linting *.py
pandas/io/excel.py:1794:1: E305 expected 2 blank lines after class or function definition, found 1

you can check this locally via:

make lint-diff (or just directly run flake8 on that file)

@jnothman
Copy link
Contributor Author

jnothman commented Oct 29, 2017 via email

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clipboard failure on travis looks unrelated.

All good @jreback?

@jreback jreback added this to the 0.22.0 milestone Oct 31, 2017
@@ -22,7 +22,7 @@ New features
Other Enhancements
^^^^^^^^^^^^^^^^^^

-
- Better support for ``Dataframe.style.to_excel()`` output with the ``xlsxwriter`` engine. (:issue:`16149`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now. it might make sense to enhance the excel docs with what is better now? (or maybe a listing of styles that work). but for a followup.

@jreback jreback merged commit 5d096f7 into pandas-dev:master Oct 31, 2017
@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

thanks @jnothman

peterpanmj pushed a commit to peterpanmj/pandas that referenced this pull request Oct 31, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants