-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pandas to_json with orient "table" returns wrong schema & data string #38256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
show the code for reading the pandas_version is correct ; it's the last time this format changed (this is documented) |
This will work (all strings) : This won't work (all ints, I get ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.) : (Note that I made a mistake : this last code was the one I posted before as "expected output" but there are obviously some things about the 'mixing dicts' exception which are out of my level.) This won't work either (with the string worked out by pandas from the to_json as posted before) : The last one raise an exception (this was intended to be a minimal reproduceable example) ; the one I tested at first (see this post in stackoverflow) returned a dataframe with empty values. |
pls edit the top post and show only a complete round trip - and where it fails this is not designed to be hand edited - integer keys are also not allowed (so puzzled where that is coming from) |
Just edited the post. Though the error here is not the exception raised when reading the json, it is the way pandas writes it (so the exception is perfectly right and not related to this problem). The int keys (which are not usual in pandas either, I know) came from a 'melt' method ; if key integers are strictly forbidden by the json format, maybe we should insert something like this in the to_json method :
|
Is this the minimal reproducer that's supposed to roundtrip? In [10]: import pandas as pd
...: import pandas._testing as tm
...:
...: df = pd.DataFrame([[1,2,3],[4,5,6]], columns=[1, 2, 3])
...: json = test.to_json(orient="table")
...: res = pd.read_json(json, orient="table")
...: tm.assert_frame_equal(res, df) Currently it throws:
|
@arw2019 EDIT : I can't figure how the objToJSON command from pandas._libs.json is working as I don't know C (best update of the code might be to mimic it as it seems to be used to parse the data). |
Is there any update on this? I ran into the same issue, heres a simple round trip to replicate the issue.
Output
For anyone else with the same issue, as a workaround I am casting columns names in the schema output to strings.
|
@Wolf-Byte updates happen when community folks push PRs the core team can provide review |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
When the initial columns are integers, the schema dict returns correct names (that unquoted integers), but the data dict identifies columns as string (quoted integers). Therefore, any loaded dataframe from this json format will return a dataframe full of empty (NaN) values or fail with an exception (I don't know which triggers which ; this minimal example here will trigger an exception ; my original dataset with multiindexes in stackoverflow returned an empty dataframe...
Expected Output
This output for pandas.to_json(orient="table") could be read (though it is losing the "int" label key and transforming it to strings) :
'{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"1","type":"integer"},{"name":"2","type":"integer"},{"name":"3","type":"integer"}],"primaryKey":["index"],"pandas_version":"0.0.20"},"data":[{"index":0,"1":1,"2":2,"3":3},{"index":1,"1":4,"2":5,"3":6}]}'
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 67a3d42
python : 3.6.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : fr
LOCALE : None.None
pandas : 1.1.4
numpy : 1.18.4
pytz : 2017.2
dateutil : 2.8.1
pip : 20.2.4
setuptools : 36.6.0
Cython : 0.27.2
pytest : 3.2.3
hypothesis : None
sphinx : 1.6.5
blosc : 1.5.1
feather : 0.4.0
xlsxwriter : 1.0.2
lxml.etree : 4.1.0
html5lib : 0.9999999
pymysql : None
psycopg2 : None
jinja2 : 2.9.6
IPython : 6.2.1
pandas_datareader: None
bs4 : 4.6.0
bottleneck : 1.2.1
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 2.2.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.7.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.1.14
tables : None
tabulate : 0.8.5
xarray : 0.9.6
xlrd : 1.1.0
xlwt : None
numba : 0.35.0
The text was updated successfully, but these errors were encountered: