BUG: Fix reading of multi_index dataframe with NaN #57070
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.See #56929 for some of the discussion around this bug. Instead of removing the additional header row as suggested in the issue, I fixed the bug after reading all three rows as a header. The code that errors will have these headers: header: ['0', 'Unnamed: 1_level_2'], the code where we explicitly set empty values to NaN before writing has these: header: ['0', 'NaN']. The problem with the empty value stems from this line:
pandas/pandas/_libs/parsers.pyx
Lines 694 to 698 in 7368686
I think there are several ways of fixing this. In this PR I tried to make sure that the first header is treated like the second by simply changing the comparison in this line:
pandas/pandas/_libs/parsers.pyx
Line 749 in 7368686
No other tests are failing after recompiling the code, therefore I am fairly confident that this fixes the issue and does not introduce others. However, it would still be good if someone else with more parsing experience looks over it.
Edit: It seems some tests are breaking due to this, will check it out now.