Skip to content

read_json can't import own exported data with orient=table #18912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
floli opened this issue Dec 22, 2017 · 7 comments · Fixed by #19039
Closed

read_json can't import own exported data with orient=table #18912

floli opened this issue Dec 22, 2017 · 7 comments · Fixed by #19039
Labels
IO JSON read_json, to_json, json_normalize Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@floli
Copy link

floli commented Dec 22, 2017

import pandas
pd = pandas.DataFrame([["Arr","alpha", [1,2,3,4]],["Bee", "Beta", [10,20,30,40]]], index = [["A", "B"], ["Null", "Eins"]], columns = ["Aussprache", "Griechisch", "Args"])

: print(pd)
       Aussprache Griechisch              Args
A Null        Arr      alpha      [1, 2, 3, 4]
B Eins        Bee       Beta  [10, 20, 30, 40]

pd.to_json("test.json", orient = "table")
pandas.read_json("test.json", orient="table")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-66-a171b349d78d> in <module>()
----> 1 pandas.read_json("test.json", orient="table")

/usr/lib/python3.6/site-packages/pandas/io/json/json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
    364         return json_reader
    365
--> 366     return json_reader.read()
    367
    368

/usr/lib/python3.6/site-packages/pandas/io/json/json.py in read(self)
    463             )
    464         else:
--> 465             obj = self._get_object_parser(self.data)
    466         self.close()
    467         return obj

/usr/lib/python3.6/site-packages/pandas/io/json/json.py in _get_object_parser(self, json)
    480         obj = None
    481         if typ == 'frame':
--> 482             obj = FrameParser(json, **kwargs).parse()
    483
    484         if typ == 'series' or obj is None:

/usr/lib/python3.6/site-packages/pandas/io/json/json.py in parse(self)
    572
    573         else:
--> 574             self._parse_no_numpy()
    575
    576         if self.obj is None:

/usr/lib/python3.6/site-packages/pandas/io/json/json.py in _parse_no_numpy(self)
    802         else:
    803             self.obj = DataFrame(
--> 804                 loads(json, precise_float=self.precise_float), dtype=None)
    805
    806     def _process_converter(self, f, filt=None):

/usr/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    328                                  dtype=dtype, copy=copy)
    329         elif isinstance(data, dict):
--> 330             mgr = self._init_dict(data, index, columns, dtype=dtype)
    331         elif isinstance(data, ma.MaskedArray):
    332             import numpy.ma.mrecords as mrecords

/usr/lib/python3.6/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    459             arrays = [data[k] for k in keys]
    460
--> 461         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    462
    463     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

/usr/lib/python3.6/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   6128     # figure out the index, if necessary
   6129     if index is None:
-> 6130         index = extract_index(arrays)
   6131     else:
   6132         index = _ensure_index(index)

/usr/lib/python3.6/site-packages/pandas/core/frame.py in extract_index(data)
   6179
   6180             if have_dicts:
-> 6181                 raise ValueError('Mixing dicts with non-Series may lead to '
   6182                                  'ambiguous ordering.')
   6183

ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.

Problem description

test.json looks like:

{"schema": {"fields":[{"name":"level_0","type":"string"},{"name":"level_1","type":"string"},{"name":"Aussprache","type":"string"},{"name":"Griechisch","type":"string"},{"name":"Args","type":"string"}],"primaryKey":["level_0","level_1"],"pandas_version":"0.20.0"}, "data": [{"level_0":"A","level_1":"Null","Aussprache":"Arr","Griechisch":"alpha","Args":[1,2,3,4]},{"level_0":"B","level_1":"Eins","Aussprache":"Bee","Griechisch":"Beta","Args":[10,20,30,40]}]}

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.7-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8

pandas: 0.21.0
pytest: None
pip: 9.0.1
setuptools: 38.2.4
Cython: 0.27.3
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.5
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: None
sqlalchemy: 1.1.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
@WillAyd
Copy link
Member

WillAyd commented Dec 23, 2017

Although you can write json using orient=records it is not a documented value for read_json (see the docs) and I don't see anywhere in the code that currently allows for this.

Will let others chime in but right now I think this would need to be an enhancement

@jreback
Copy link
Contributor

jreback commented Dec 23, 2017

FYI don't use pd as a variable name, when the common import for pandas is pd

In [26]: df = pd.DataFrame([["Arr","alpha", [1,2,3,4]],["Bee", "Beta", [10,20,30,40]]], index = [["A", "B"], ["Null", "Eins"]], columns = ["Aussprache", "Griechisch", "Args"])

In [27]: df
Out[27]: 
       Aussprache Griechisch              Args
A Null        Arr      alpha      [1, 2, 3, 4]
B Eins        Bee       Beta  [10, 20, 30, 40]

I don't we have any guarantees when you embed lists inside a field and try to export to json. Further I don't think orient='table' is actually reversible.

@jreback jreback added IO JSON read_json, to_json, json_normalize Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 23, 2017
@jreback
Copy link
Contributor

jreback commented Dec 23, 2017

records works just fine here.

In [34]: pd.read_json(df.to_json(orient='records'), orient='records')
Out[34]: 
               Args Aussprache Griechisch
0      [1, 2, 3, 4]        Arr      alpha
1  [10, 20, 30, 40]        Bee       Beta

@jreback
Copy link
Contributor

jreback commented Dec 23, 2017

cc @TomAugspurger

@TomAugspurger
Copy link
Contributor

orient='table' isn't a valid reader. That should raise for now, until it's implemented.

@floli
Copy link
Author

floli commented Dec 29, 2017

I tried all the documented values for orient that are supported by both read_json and to_json using:

import pandas as pd

df = pd.DataFrame([["Arr","alpha", [1,2,3,4]],["Bee", "Beta", [10,20,30,40]]], index = [["A", "B"], ["Null", "Eins"]], columns = ["Aussprache", "Griechisch", "Args"])
print(df)

orient = "split"
df.to_json("test.json", orient = orient)
df = pd.read_json("test.json", orient = orient)
print(df)

orient = "split"

           Aussprache Griechisch              Args
A Null        Arr      alpha      [1, 2, 3, 4]
B Eins        Bee       Beta  [10, 20, 30, 40]

          Aussprache Griechisch              Args
A    B           Arr      alpha      [1, 2, 3, 4]
Null Eins        Bee       Beta  [10, 20, 30, 40]

That even seems to change the semantics and mixes up the index.

orient = "records"

           Aussprache Griechisch              Args
A Null        Arr      alpha      [1, 2, 3, 4]
B Eins        Bee       Beta  [10, 20, 30, 40]

               Args Aussprache Griechisch
0      [1, 2, 3, 4]        Arr      alpha
1  [10, 20, 30, 40]        Bee       Beta

That completely looses the index.

orient = "index"

Produces that traceback for me:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/florian/scratch/pandas-import.py", line 8, in <module>
    df = pd.read_json("test.json", orient = orient)
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 366, in read_json
    return json_reader.read()
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 465, in read
    obj = self._get_object_parser(self.data)
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 482, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 574, in parse
    self._parse_no_numpy()
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 801, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None).T
ValueError: No ':' found when decoding object value

orient = "columns"

Also produces that traceback for me:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/florian/scratch/pandas-import.py", line 8, in <module>
    df = pd.read_json("test.json", orient = orient)
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 366, in read_json
    return json_reader.read()
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 465, in read
    obj = self._get_object_parser(self.data)
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 482, in _get_object_parser
    obj = FrameParser(json, **kwargs).parse()
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 574, in parse
    self._parse_no_numpy()
  File "/usr/lib/python3.6/site-packages/pandas/io/json/json.py", line 791, in _parse_no_numpy
    loads(json, precise_float=self.precise_float), dtype=None)
ValueError: No ':' found when decoding object value

orient = "values"

>>>        Aussprache Griechisch              Args
A Null        Arr      alpha      [1, 2, 3, 4]
B Eins        Bee       Beta  [10, 20, 30, 40]
     0      1                 2
0  Arr  alpha      [1, 2, 3, 4]
1  Bee   Beta  [10, 20, 30, 40]

Doesn't loose information, but you have to manually set the index columns.

So it seems there is no possibility to write and read from json and completely recover the data frame.

@TomAugspurger
Copy link
Contributor

@floli thanks for the summary.

The goal is for orient='table' to be the roundtripable format (#9146). You need metadata like the dtypes, names, set of categories, etc. when writing. But no one has implemented the reader yet. Are you interested in working on that?

@jreback jreback added this to the 0.23.0 milestone Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants