Skip to content

Empty data frames not round-trippable to JSON #21287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ludaavics opened this issue Jun 1, 2018 · 5 comments · Fixed by #21318
Closed

Empty data frames not round-trippable to JSON #21287

ludaavics opened this issue Jun 1, 2018 · 5 comments · Fixed by #21318
Labels
Bug good first issue IO JSON read_json, to_json, json_normalize
Milestone

Comments

@ludaavics
Copy link

Code Sample

import pandas as pd
df = pd.DataFrame([], columns=['a', 'b', 'c'])
df.to_json('tmp.json', orient='table')
pd.read_json('tmp.json', orient='table')

>> KeyError: "['index' 'a' 'b' 'c'] not in index"

Problem description

Empty data frames saved as JSON fail to load back to data frames.
Quick fix: replace this line with

df = DataFrame(table['data'], columns=col_order)[col_order]

Expected Output

print(df)
Empty DataFrame
Columns: [a, b, c]
Index: []

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0
pytest: 3.2.3
pip: 10.0.1
setuptools: 36.6.0
Cython: 0.27.2
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.7.1
xarray: 0.9.6
IPython: 6.2.1
sphinx: 1.6.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.0
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.0.2
lxml: None
bs4: 4.6.0
html5lib: 1.0b10
sqlalchemy: 1.2.3
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.3
pandas_gbq: None
pandas_datareader: None

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: None
pip: 9.0.3
setuptools: 39.1.0
Cython: None
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.0.4
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@WillAyd
Copy link
Member

WillAyd commented Jun 1, 2018

Thanks for the report and investigation - care to make a PR?

@ludaavics
Copy link
Author

ludaavics commented Jun 5, 2018 via email

@pyryjook
Copy link
Contributor

pyryjook commented Jun 5, 2018

Hi!

I actually ended up diving into this with my PR #21318. I did the changes the way @ludaavics initially suggested (BTW, thanks for heads-up with the example!). After writing a unit test for it I realised that now that the actual error is fixed the DF read back from the JSON gets a different index type than it originally had:

By using: tm.assert_frame_equal(expected, result) this is the result:

E       AssertionError: DataFrame.index are different
E
E       DataFrame.index classes are not equivalent
E       [left]:  Index([], dtype='object')
E       [right]: Float64Index([], dtype='float64')

Need to dig this a bit deeper now. Any initial thoughts on why this might happen or am I missing something?

(This is my first contribution for this project, so might be something obvious that I have not (yet) noticed)

@WillAyd
Copy link
Member

WillAyd commented Jun 5, 2018

@pyryjook for questions specific to your commits it is easier to help if you push the commit to the PR and ask the question there

@pyryjook
Copy link
Contributor

pyryjook commented Jun 5, 2018

Yeah, I'll do it and let's then continue there.

@gfyoung gfyoung added the Bug label Jun 6, 2018
@jreback jreback added this to the 0.23.1 milestone Jun 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug good first issue IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants