Skip to content

BUG: XlsxWriter ignoring formats on numpy types if merged cells #27006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 28, 2019

Conversation

riptusk331
Copy link
Contributor

@riptusk331 riptusk331 commented Jun 23, 2019

The XlsxWriter write_cells method takes a cells parameter that is itself derived from a generator of the various cells in the DataFrame. It calls the _value_with_format() to convert any numpy types to Python types for the Excel writer.

In the section of code that deals with writing merged cells, the original passed cell.val parameter was being passed into the writer, rather than the val returned from the _value_with_format() function. This
caused incompatible numpy or Pandas types to get passed into the writer when the condition of being in a 'merged' (grouped) DataFrame cell was met.

In my case, I had a Pandas Period object in the MultiIndex of a grouped DataFrame . All that needed to be done to fix this was simply removing the cell. from cell.val that was being passed into wks.merge_range() function, and replacing it with the correct val, which is 'type safe' for the Excel writer.

The write_cells method takes a 'cells' parameter that is itself derived
from a generator of the various cells in the dataframe. It calls
_value_with_format to convert any numpy types to Python types for the
Excel writer. In the section of code that deals with writing merged
cells, the original 'cell.val' parameter was being passed into the
writer, rather than the 'val' returned from the format function. This
caused incompatible numpy or Pandas formats to get passed into the
writer when they were in a 'merged' (grouped) DataFrame cell. In my
case I had a Period object in the DataFrame index. All that needed to
be done was simply removing 'cell.'
@simonjayhawkins simonjayhawkins requested a review from WillAyd June 23, 2019 08:55
@simonjayhawkins simonjayhawkins added IO Excel read_excel, to_excel Period Period data type labels Jun 23, 2019
@simonjayhawkins
Copy link
Member

@riptusk331 Thanks for the PR.

can you add tests and a whatsnew.

@WillAyd
Copy link
Member

WillAyd commented Jun 23, 2019

Nice catch! Yea as mentioned I think just need a test and a whatsnew for v0.25 and can get this in

@WillAyd WillAyd added this to the 0.25.0 milestone Jun 23, 2019
@riptusk331
Copy link
Contributor Author

Never written a proper test before, so giving the docs a read and will try to push something this week.

@jreback
Copy link
Contributor

jreback commented Jun 27, 2019

can you add a test here

@riptusk331
Copy link
Contributor Author

yes trying to get to this tomorrow

@pep8speaks
Copy link

pep8speaks commented Jun 27, 2019

Hello @riptusk331! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-28 00:43:56 UTC

@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

Merging #27006 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #27006      +/-   ##
==========================================
- Coverage   91.99%   91.98%   -0.01%     
==========================================
  Files         180      180              
  Lines       50774    50774              
==========================================
- Hits        46712    46707       -5     
- Misses       4062     4067       +5
Flag Coverage Δ
#multiple 90.63% <ø> (ø) ⬆️
#single 41.83% <ø> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/io/excel/_xlsxwriter.py 94.36% <ø> (ø) ⬆️
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.89% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.84% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4d4ec5...4d5eab1. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@08a599b). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #27006   +/-   ##
=========================================
  Coverage          ?   92.04%           
=========================================
  Files             ?      180           
  Lines             ?    50714           
  Branches          ?        0           
=========================================
  Hits              ?    46679           
  Misses            ?     4035           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.68% <ø> (?)
#single 41.87% <ø> (?)
Impacted Files Coverage Δ
pandas/io/excel/_xlsxwriter.py 94.36% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08a599b...b16397d. Read the comment docs.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a whatsnew note for v0.25?

@riptusk331
Copy link
Contributor Author

whatsnew added

expected.to_excel(self.path)
result = pd.read_excel(self.path, header=[0, 1], index_col=0)
# need to convert PeriodIndexes to standard Indexes for assert equal
expected.columns.set_levels([[str(i) for i in mi.levels[0]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just simplify this as expected.columns = expected.columns.astype(object) - Period won't be lossless to / from Excel so makes sense

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify this as stated

Copy link
Contributor Author

@riptusk331 riptusk331 Jun 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This simplication doesn't seem to work. Comparison fails - the Index read back in isn't preserved as a Period.

E  MultiIndex level [0] classes are not equivalent
E  [left]:  PeriodIndex(['2018', '2018'], dtype='period[A-DEC]', freq='A-DEC')
E  [right]: Index(['2018', '2018'], dtype='object')

@jreback
Copy link
Contributor

jreback commented Jun 27, 2019

@riptusk331 looks like a linting error, can you fixup and repush.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor comments to improve coverage and readability but lgtm once resolved

mi = MultiIndex.from_tuples([(pd.Period('2018'), pd.Period('2018Q1')),
(pd.Period('2018'), pd.Period('2018Q2'))])
expected = DataFrame(np.random.rand(2, len(mi)), columns=mi)
expected.to_excel(self.path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you add the merge_cells fixture do merge_cells=merge_cells as part of the call here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I do this, won't this always fail the assert comparison? When this is False, it creates a single cell that contains both levels in the excel file (for instance '2018.2018Q2'). When I read this back in, it's read in as a single column Index and fails comparing against the original MultiIndex.

expected.to_excel(self.path)
result = pd.read_excel(self.path, header=[0, 1], index_col=0)
# need to convert PeriodIndexes to standard Indexes for assert equal
expected.columns.set_levels([[str(i) for i in mi.levels[0]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify this as stated

@riptusk331
Copy link
Contributor Author

@jreback fixed linting and repushed.

@WillAyd This passes all pytest tests as I pushed it, but I was not able to implement all your suggestions without failing tests for the reasons mentioned in the comments above. Namely using merge_cells in the call and the column type setting simplification. Thinking of how to best resolve, but wanted to get most current working iteration pushed.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @jreback

@jreback jreback merged commit c3133db into pandas-dev:master Jun 28, 2019
@jreback
Copy link
Contributor

jreback commented Jun 28, 2019

thanks @riptusk331

@riptusk331
Copy link
Contributor Author

@jreback glad to contribute and look forward to helping out more. will delete this branch shortly.

@WillAyd thanks for guiding me! when you have time, would love to get input on how my test code could be further improved over what i pushed, as i was unable to figure out how to properly implement 2 of your suggestions (or was possibly not fully understanding what you were looking for / what's proper in a unit test). cheers.

@riptusk331 riptusk331 deleted the period-excel-fix branch June 28, 2019 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_excel() TyperError: Unsupported type <class 'pandas._libs.tslibs.period.Period'> in write()
5 participants