-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: XlsxWriter ignoring formats on numpy types if merged cells #27006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The write_cells method takes a 'cells' parameter that is itself derived from a generator of the various cells in the dataframe. It calls _value_with_format to convert any numpy types to Python types for the Excel writer. In the section of code that deals with writing merged cells, the original 'cell.val' parameter was being passed into the writer, rather than the 'val' returned from the format function. This caused incompatible numpy or Pandas formats to get passed into the writer when they were in a 'merged' (grouped) DataFrame cell. In my case I had a Period object in the DataFrame index. All that needed to be done was simply removing 'cell.'
@riptusk331 Thanks for the PR. can you add tests and a whatsnew. |
Nice catch! Yea as mentioned I think just need a test and a whatsnew for v0.25 and can get this in |
Never written a proper test before, so giving the docs a read and will try to push something this week. |
can you add a test here |
yes trying to get to this tomorrow |
Hello @riptusk331! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-06-28 00:43:56 UTC |
Codecov Report
@@ Coverage Diff @@
## master #27006 +/- ##
==========================================
- Coverage 91.99% 91.98% -0.01%
==========================================
Files 180 180
Lines 50774 50774
==========================================
- Hits 46712 46707 -5
- Misses 4062 4067 +5
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #27006 +/- ##
=========================================
Coverage ? 92.04%
=========================================
Files ? 180
Lines ? 50714
Branches ? 0
=========================================
Hits ? 46679
Misses ? 4035
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a whatsnew note for v0.25?
whatsnew added |
expected.to_excel(self.path) | ||
result = pd.read_excel(self.path, header=[0, 1], index_col=0) | ||
# need to convert PeriodIndexes to standard Indexes for assert equal | ||
expected.columns.set_levels([[str(i) for i in mi.levels[0]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just simplify this as expected.columns = expected.columns.astype(object)
- Period won't be lossless to / from Excel so makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify this as stated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This simplication doesn't seem to work. Comparison fails - the Index read back in isn't preserved as a Period.
E MultiIndex level [0] classes are not equivalent
E [left]: PeriodIndex(['2018', '2018'], dtype='period[A-DEC]', freq='A-DEC')
E [right]: Index(['2018', '2018'], dtype='object')
@riptusk331 looks like a linting error, can you fixup and repush. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor comments to improve coverage and readability but lgtm once resolved
mi = MultiIndex.from_tuples([(pd.Period('2018'), pd.Period('2018Q1')), | ||
(pd.Period('2018'), pd.Period('2018Q2'))]) | ||
expected = DataFrame(np.random.rand(2, len(mi)), columns=mi) | ||
expected.to_excel(self.path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you add the merge_cells
fixture do merge_cells=merge_cells
as part of the call here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I do this, won't this always fail the assert comparison? When this is False, it creates a single cell that contains both levels in the excel file (for instance '2018.2018Q2'). When I read this back in, it's read in as a single column Index and fails comparing against the original MultiIndex.
expected.to_excel(self.path) | ||
result = pd.read_excel(self.path, header=[0, 1], index_col=0) | ||
# need to convert PeriodIndexes to standard Indexes for assert equal | ||
expected.columns.set_levels([[str(i) for i in mi.levels[0]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify this as stated
@jreback fixed linting and repushed. @WillAyd This passes all pytest tests as I pushed it, but I was not able to implement all your suggestions without failing tests for the reasons mentioned in the comments above. Namely using merge_cells in the call and the column type setting simplification. Thinking of how to best resolve, but wanted to get most current working iteration pushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @jreback
thanks @riptusk331 |
@jreback glad to contribute and look forward to helping out more. will delete this branch shortly. @WillAyd thanks for guiding me! when you have time, would love to get input on how my test code could be further improved over what i pushed, as i was unable to figure out how to properly implement 2 of your suggestions (or was possibly not fully understanding what you were looking for / what's proper in a unit test). cheers. |
The XlsxWriter
write_cells
method takes acells
parameter that is itself derived from a generator of the various cells in the DataFrame. It calls the_value_with_format()
to convert any numpy types to Python types for the Excel writer.In the section of code that deals with writing merged cells, the original passed
cell.val
parameter was being passed into the writer, rather than theval
returned from the_value_with_format()
function. Thiscaused incompatible numpy or Pandas types to get passed into the writer when the condition of being in a 'merged' (grouped) DataFrame cell was met.
In my case, I had a Pandas
Period
object in the MultiIndex of a grouped DataFrame . All that needed to be done to fix this was simply removing thecell.
fromcell.val
that was being passed intowks.merge_range()
function, and replacing it with the correctval
, which is 'type safe' for the Excel writer.