-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Round trip json serialization issues #22525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Having just noticed the comment in io.json.json.read_json
From that I can see that this as been noted during development. |
A smattering of issues here but generally I don't think any of these can be supported due to having a numeric column name (see #19129). Can you try assigning a non-numeric column name and see if that resolves? Error messages can certainly be improved; aforementioned issue is still open if you want to take a look |
I agree that this looks like a duplicate to #19129. If not, we can reopen or create a new issue. |
Problem description
I would hope there would be a way for each of the serialization methods to store and retrieve enough metadata to reconstruct the original object.
Reading through #4889, it seems that most of the problems associated should have been resolved with 0.20 with the introduction of orient='table'.
When writing tests for this, it became apparent that several circumstances could produce issues.
tl;dr The schema as dumped and read is inconsistent, producing problematic behavior
Example 1: Empty DataFrame
inspecting the to_json output we get
According to http://pandas-docs.github.io/pandas-docs-travis/io.html#io-table-schema
the "type":"string" should indicate the pandas type of "object". The DataFrame.to_json is not respecting the type in memory and is converting the type from float64 to object.
Example 2: Int DataFrame
I'm not sure exactly what is going on here.
If I drop the orient='table', it creates a valid DataFrame with equal values but still has the problem of still failing the assertion due to inferred index data types.
Inspecting the to_json output we get:
So now we correctly encode the type as integer but we can't decode it.
Example 3: Float DataFrame
Now we have a silent converting of the float value of 1.0 to nan. Probably related to the example 2 issue but in this case the index can handle nan as it is a float index instead of integer.
Inspecting the json output
Code to replicate most cases that could fail
Output of
pd.show_versions()
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.4
pytest: 3.7.2
pip: 18.0
setuptools: 40.2.0
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.7
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: