Raise ValueError for read_json and orient='table' With Numeric Column Names #19129

WillAyd · 2018-01-08T04:23:13Z

Code Sample, a copy-pastable example if possible

df = pd.DataFrame([[1,2,3,4]], columns=[5,6,7,8])
df.to_json('test.json', orient='table')
pd.read_json('test.json', orient='table')

KeyError: '[5 6 7 8] not in index'

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: 16d0262
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+79.g16d026212
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.1
openpyxl: 2.5.0b1
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.13
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: 0.1.2
fastparquet: 0.1.3
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-01-08T12:04:35Z

I thought that https://frictionlessdata.io/specs/table-schema/ required the field names to be strings, but that may not be the case.

robmarkcole · 2018-03-10T11:58:07Z

There is an error with string column/index names too:

df = pd.DataFrame([['a', 'b'], ['c', 'd']],
                  index=['row 1', 'row 2'],
                  columns=['col 1', 'col 2'])

df.to_json(orient='table')
pd.read_json(_, orient='table')
...
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.

This issue should be renamed to read_json and orient='table' Fails

WillAyd · 2018-03-10T16:55:29Z

@robmarkcole what version are you using? Your example ran fine for me on master.

INSTALLED VERSIONS

commit: 28e7a9498457e8cccc105a6261958197889325fa
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+487.g28e7a9498
pytest: 3.4.1
pip: 9.0.1
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.1
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.7.0
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.3
fastparquet: 0.1.4
pandas_gbq: None
pandas_datareader: None

robmarkcole · 2018-03-10T21:03:53Z

@WillAyd error on 0.22.0. No error on 0.23.0.dev0

WillAyd · 2018-03-11T17:06:37Z

General support for read_json with table='orient' was only just added for the v0.23 release so makes sense it doesn't work on 0.22. See #19039

MichaMucha · 2018-05-17T13:45:37Z

Hi!
Captain Obvious here, just wanted to say getting this in 0.23.0 as well

import pandas as pd
breaking_case = pd.DataFrame({
    1: [1,2], 
    2: [3,4]}
)
pd.read_json(breaking_case.to_json(orient='table'), orient='table')

KeyError: '[1 2] not in index'

WillAyd · 2018-05-17T15:02:57Z

@MichaMucha that's right this was never implemented, though not sure if it's valid JSON either. If interested investigation into the schema linked above to confirm or deny is welcome!

MichaMucha · 2018-05-19T21:45:39Z

Thanks for the quick reply!

I read through the page and it seems that:

a descriptor specifying columns is an "ordered dict" equivalent
order of column descriptions in that dict implies column order
each column is described by a few properties, one of those is name.
technically speaking, { "name" : 1 } is valid JSON

I would reason that you can have numerical column names.

direct quotes from the spec (a "field" is a column in our case):
A Table Schema is represented by a descriptor. The descriptor MUST be a JSON object (JSON is defined in RFC 4627).
It MUST contain a property fields. fields MUST be an array where each entry in the array is a field descriptor (as defined below). The order of elements in fields array MUST be the order of fields in the CSV file.
A field descriptor MUST be a JSON object that describes a single field.

Let me know what can I do if I can help more

WillAyd · 2018-05-20T01:37:40Z

Thanks for the review! The table implementation is located in pandas/pandas/io/json/table_schema.py so if you poke around there you should see where this could be implemented. Assuming this is your first time, also be sure to read through the contributing guide:

https://pandas.pydata.org/pandas-docs/stable/contributing.html

Hope that helps but let me know if you have any questions

MichaMucha · 2018-05-31T22:32:25Z

Thank you! Sorry took me a while to find time.
Thanks for the links! I looked at the table implementation. Turns out table_schema.py is not to blame!
The assignment happens at line 96 and leaves the type intact.

I dug a little deeper and it seems that this guy over here -
pandas/pandas/io/json/json.py:214 - JSONTableWriter._write
wants to write data in a row-oriented fashion.
Every row becomes a dict, this makes every column a key, and JSON needs keys to be strings.
When you read it back, you get strings obviously.

One workaround I can think about is to match field['name'] to the stringified columns found in datahere, and cast them to the type of field['name'].
This will cause ambiguity trouble if you have a data frame with a column "1" and 1 for example.

Another solution I can imagine is an error asking the user to provide string columns.

Let me know what you think.
Thanks again for the contributing guide!

WillAyd · 2018-06-01T04:14:09Z

@MichaMucha IIUC that supports the argument that numeric column names should not be allowed, given the column names are the keys in the JSON table schema and JSON keys need to be strings.

If that's the case and you are looking to contribute then I'd suggest perhaps raising a more descriptive ValueError when trying to write to the JSON table schema using non-string column names

albertvillanova · 2019-02-20T06:42:39Z

@WillAyd I think the problem is with the JSON serialization of the DataFrame (the 'data' for orient='table') and not with 'schema' (TableSchema spec).

IIUC, the aim of orient='table' is to make round-trip JSON serialization-deserialization of pandas objects. As pandas DataFrame can have non-string column names (indeed, that is the default if column names are not passed explicitly at instantiation), then the column names SHOULD NOT be used as keys in the JSON object (JSON spec imposes that keys must be strings).

It is also the case for index names: they can be non-strings. Therefore, index names should not be used as keys in the JSON object.

WillAyd · 2019-02-20T06:46:13Z

#19129 should already cover that just needs community PR

albertvillanova · 2019-02-20T06:56:29Z

Yes, but I was suggesting not just "raising a more descriptive ValueError" (sic), but changing the implementation of the JSON serialization for orient='table'.

…9129) (pandas-dev#22525)

IsaacG · 2023-07-12T17:07:37Z

take

chandra-teajunkie · 2025-02-16T22:46:15Z

Hi,
take
I would like to try working on this if this is still up.

IsaacG · 2025-02-17T00:47:36Z

release

chandra-teajunkie · 2025-02-17T08:58:46Z

Hi Isaac,
I am unsure about what the comment means.
Sorry if its something obvious

IsaacG · 2025-02-20T06:43:36Z

I am not working on this issue and happy to no longer be assigned to this issue.

I cannot comment on the importance/relevance/value of this issue at this time.

chandra-teajunkie · 2025-02-20T12:28:09Z

got it.
thank you for the response.

WillAyd changed the title ~~read_json and table='orient' Fails With Numeric Column Names~~ read_json and table='orient' Fails With Numeric Column Names Jan 8, 2018

TomAugspurger changed the title ~~read_json and table='orient' Fails With Numeric Column Names~~ read_json and orient='table' Fails With Numeric Column Names Jan 8, 2018

TomAugspurger added the IO JSON read_json, to_json, json_normalize label Jan 8, 2018

TomAugspurger added this to the Next Major Release milestone Jan 8, 2018

jreback added Bug Difficulty Intermediate labels Jan 9, 2018

WillAyd changed the title ~~read_json and orient='table' Fails With Numeric Column Names~~ Raise ValueError for read_json and orient='table' With Numeric Column Names Jun 18, 2018

WillAyd added good first issue and removed Bug Difficulty Intermediate labels Jun 18, 2018

WillAyd mentioned this issue Aug 28, 2018

Round trip json serialization issues #22525

Closed

albertvillanova mentioned this issue Feb 20, 2019

BUG: Fix type coercion in read_json orient='table' (#21345) #25219

Merged

4 tasks

albertvillanova pushed a commit to albertvillanova/pandas that referenced this issue Feb 28, 2019

Fix JSON orient='table' issues for numeric column names (pandas-dev#1…

c88affc

…9129) (pandas-dev#22525)

albertvillanova mentioned this issue Feb 28, 2019

Fix JSON orient='table' issues with numeric column names #25488

Closed

3 tasks

jbrockmendel removed the Effort Low label Oct 21, 2019

mroeschke added Enhancement Error Reporting Incorrect or improved errors from pandas labels May 8, 2020

ChiQiao mentioned this issue Mar 29, 2021

BUG: pd.read_json sets wrong value for numeric column names #40674

Open

3 tasks

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

github-actions bot assigned IsaacG Jul 12, 2023

chandra-teajunkie mentioned this issue Feb 16, 2025

My feature branch to issue #19129 (read_json and orient='table' With Numeric Column) #60945

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise ValueError for read_json and orient='table' With Numeric Column Names #19129

Raise ValueError for read_json and orient='table' With Numeric Column Names #19129

WillAyd commented Jan 8, 2018

INSTALLED VERSIONS

TomAugspurger commented Jan 8, 2018

robmarkcole commented Mar 10, 2018

WillAyd commented Mar 10, 2018 •

edited

Loading

INSTALLED VERSIONS

robmarkcole commented Mar 10, 2018

WillAyd commented Mar 11, 2018 •

edited

Loading

MichaMucha commented May 17, 2018

WillAyd commented May 17, 2018

MichaMucha commented May 19, 2018 •

edited

Loading

WillAyd commented May 20, 2018

MichaMucha commented May 31, 2018

WillAyd commented Jun 1, 2018

albertvillanova commented Feb 20, 2019 •

edited

Loading

WillAyd commented Feb 20, 2019

albertvillanova commented Feb 20, 2019

IsaacG commented Jul 12, 2023

chandra-teajunkie commented Feb 16, 2025

IsaacG commented Feb 17, 2025

chandra-teajunkie commented Feb 17, 2025 •

edited

Loading

IsaacG commented Feb 20, 2025

chandra-teajunkie commented Feb 20, 2025

Raise ValueError for read_json and orient='table' With Numeric Column Names #19129

Raise ValueError for read_json and orient='table' With Numeric Column Names #19129

Comments

WillAyd commented Jan 8, 2018

Code Sample, a copy-pastable example if possible

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Jan 8, 2018

robmarkcole commented Mar 10, 2018

WillAyd commented Mar 10, 2018 • edited Loading

INSTALLED VERSIONS

robmarkcole commented Mar 10, 2018

WillAyd commented Mar 11, 2018 • edited Loading

MichaMucha commented May 17, 2018

WillAyd commented May 17, 2018

MichaMucha commented May 19, 2018 • edited Loading

WillAyd commented May 20, 2018

MichaMucha commented May 31, 2018

WillAyd commented Jun 1, 2018

albertvillanova commented Feb 20, 2019 • edited Loading

WillAyd commented Feb 20, 2019

albertvillanova commented Feb 20, 2019

IsaacG commented Jul 12, 2023

chandra-teajunkie commented Feb 16, 2025

IsaacG commented Feb 17, 2025

chandra-teajunkie commented Feb 17, 2025 • edited Loading

IsaacG commented Feb 20, 2025

chandra-teajunkie commented Feb 20, 2025

Output of `pd.show_versions()`

WillAyd commented Mar 10, 2018 •

edited

Loading

WillAyd commented Mar 11, 2018 •

edited

Loading

MichaMucha commented May 19, 2018 •

edited

Loading

albertvillanova commented Feb 20, 2019 •

edited

Loading

chandra-teajunkie commented Feb 17, 2025 •

edited

Loading