-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Series.to_json produces invalid JSON with orient="index"
if index is multilevel
#31028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report :) I could not reproduce this on master, I got the output
so it looks like this has been fixed already and will behave correctly in the next release. |
This was most likely fixed by #27618 so be sure to try that out in 1.0 With that said I don't think there is a test explicit for a MultiIndex, so would welcome a PR to add that |
Ah -- I was expecting the keys to remain valid JSON so my deserialization code still blew up. I assumed that was because the JSON was still malformed. |
My one gripe with the fix is that the keys are no longer valid JSON, as in previous releases (with the above exception). This makes it difficult to reconstruct into a nested format, e.g. {
0: {
'x': 1,
'y': 'a'
},
1: {
'x': 2,
'y': 'b',
},
# ...
} This is particularly problematic when communicating with non-Python code. Loading the keys would require the transformation to happen on the Python side with |
@dargueta I'm not clear on what version of pandas you are referring to, what output you are getting and what you expect. If you can clarify would be helpful |
Could you please let us know which release gives you your expected output? |
From what I understood, having the following output should not be allowed {
"[0,"x"]":1,
"[0,"y"]":"a",
"[1,"x"]":2,
"[1,"y"]":"b",
"[2,"x"]":3,
"[2,"y"]":"c"
} I tried the example on 0.25.3 and indeed this is the output I got. |
0.25.3 does as long as the keys aren't strings, which is kinduva stupid restriction but for CSVs with no column headers it's what I end up having to work with. (By the way I also tested this on 0.24.2 and got the same result. I'm using Python 3.7 for both tests.) |
Closed by #31307 |
Code Sample, a copy-pastable example if possible
The keys in the output have unescaped quotes in them, resulting in invalid JSON. Output here is indented for clarity:
Problem description
to_json()
should always either 1) produce valid JSON, or 2) throw an exception if it cannot.Expected Output
The multilevel index in the output should be properly escaped to produce valid JSON, like so:
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 45.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
s3fs : 0.4.0
scipy : None
sqlalchemy : 1.3.12
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
The text was updated successfully, but these errors were encountered: