-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix empty Data frames to JSON round-trippable back to data frames #21318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
6dfd976
466e5a6
db3a738
5844301
fd8fa93
2f347c0
28d6e05
743c08f
833afea
2461b90
03a2b8a
fc15ba0
0a26bf8
ecc631a
8d5f127
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -686,7 +686,7 @@ def _try_convert_data(self, name, data, use_dtypes=True, | |
|
||
result = False | ||
|
||
if data.dtype == 'object': | ||
if len(data) and data.dtype == 'object': | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the fix that seems to solve the error. Any thoughts on this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The point of the JSON table schema is that we can be explicit about the types of the columns, so we shouldn't need to infer anything. Any way to avoid a call to this method altogether? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@WillAyd : I'm confused how this is relevant to the patch. It looks fine to me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My point is that since the datatypes are explicitly defined in the schema that we shouldn't need any type of inference, which I get the impression this method is doing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure, but I'm still not seeing relevance to this particular PR. The patch is pretty straightforward from what I see. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My personal opinion, I don't think option 2 makes much sense. As a user, I would rather have consistent, though undesirable behavior (and leave the test a bit weak, ignoring dtypes, or perhaps marking as expected failure?) than have empty and non empty data frame behave differently. Option 3 is obviously the best choice, though spontaneously seems a bit overkill to me? But, given #21140 I would be glad to take a look this week end and circle back with my best shot. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I swiftly went thru the past commits related to the method and that way tried to find out the reasoning behind the implementation. First of all it's quite old method (5 years) and it has seen only a few modifications during it's lifetime. On top of that, the functionality of it does seem to be quite complexly tied to multiple use cases. In that light, knowing its quite subtle look on the matter, it sure looks like quite fundamental task to re-think the purpose (or existence) of that method within this PR. I completely get the point that the fix with the Clearly, my opinion of the complexity is affected by the fact that this is my first contribution to this library :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @ludaavics that 2 is the least desirable. I would say if you don't see an apparent solution to number 3 above then create a separate issue about the coercion of object types to float with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @WillAyd : Yes, that sounds like a good plan. Thanks for bearing with me. 😄 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great, sounds like a plan! I'll make the new issue and the changes to the code accordingly. Thanks guys! |
||
|
||
# try float | ||
try: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
"""Tests for Table Schema integration.""" | ||
import json | ||
import io | ||
from collections import OrderedDict | ||
|
||
import numpy as np | ||
|
@@ -12,6 +13,7 @@ | |
from pandas.io.json.table_schema import ( | ||
as_json_table_type, | ||
build_table_schema, | ||
parse_table_schema, | ||
convert_pandas_type_to_json_field, | ||
convert_json_field_to_pandas_type, | ||
set_default_names) | ||
|
@@ -560,3 +562,11 @@ def test_multiindex(self, index_names): | |
out = df.to_json(orient="table") | ||
result = pd.read_json(out, orient="table") | ||
tm.assert_frame_equal(df, result) | ||
|
||
def test_empty_frame_roundtrip(self): | ||
# GH 21287 | ||
df = pd.DataFrame([], columns=['a', 'b', 'c']) | ||
expected = df.copy() | ||
out = df.to_json(orient='table') | ||
result = pd.read_json(out, orient='table') | ||
tm.assert_frame_equal(expected, result) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This raises an assertion error:
That's something I need to dig deeper. If there is something obvious, that I'm missing, any pointers would be appreciated in such case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for doing this PR! Beat me to it :) Otherwise I would guess we have to go down the road of including the dtypes in the JSON representation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point! And actually it only works if I assert it like this: Thoughts on this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ignoring the dtype difference is not the solution. The point of this format is to persist that metadata. What I would do is check that the proper type information for the index is being written out (you can use a io.StringIO instance instead of writing to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, you're right, ignoring the Did some initial testings and it seems that on the reading side empty I'll push a commit with fix proposal for comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pyryjook : If you could address the merge conflicts, that would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I’ll certainly do that when I’m on my laptop again!