-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix empty Data frames to JSON round-trippable back to data frames #21318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 4 commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
6dfd976
BUG: Fix empty Data frames to JSON round-trippable back to data frame…
466e5a6
Add test and whatsnew
db3a738
Empty line between test classes
5844301
Changes based on review comments
fd8fa93
Fix whatsnew + PEP
2f347c0
Prevent empty data from being coerced to float64
28d6e05
Remove debugging messages
743c08f
Remove obsolete imports from tests
833afea
Merge remote-tracking branch 'upstream/master' into empty-json-empty-…
2461b90
Loosen test type checks, remove length check from JSON parser
03a2b8a
Add GH issue number to TODO comment
fc15ba0
Parametrize JSON roundtrip test with xfail mark
0a26bf8
Merge remote-tracking branch 'upstream/master' into empty-json-empty-…
ecc631a
Merge branch 'master' into PR_TOOL_MERGE_PR_21318
jreback 8d5f127
fix whatsnew
jreback File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises an assertion error:
That's something I need to dig deeper. If there is something obvious, that I'm missing, any pointers would be appreciated in such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this PR! Beat me to it :)
A bit weak but what do we think of just
pd.testing.assert_frame_equal(expected, actual, check_dtype=False)
?Otherwise I would guess we have to go down the road of including the dtypes in the JSON representation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! And actually it only works if I assert it like this:
pd.testing.assert_frame_equal(expected, result, check_dtype=False, check_index_type=False)
So both
check_dtype
andcheck_index_type
have to be set toFalse
in order to get the assertion right.Thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring the dtype difference is not the solution. The point of this format is to persist that metadata.
What I would do is check that the proper type information for the index is being written out (you can use a io.StringIO instance instead of writing to
None
). If that appears correct then there would be an issue with the reader that is ignoring or casting the type of the index after the factThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right, ignoring the
dtype
and theindex_type
will just hide the problem.Did some initial testings and it seems that on the reading side empty
data
withdata.dtype == 'object'
gets coerced to Float64 without any clear reason.I'll push a commit with fix proposal for comments.