-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Raise ValueError for read_json and orient='table' With Numeric Column Names #19129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
table='orient'
Fails With Numeric Column Names
I thought that https://frictionlessdata.io/specs/table-schema/ required the field names to be strings, but that may not be the case. |
There is an error with string column/index names too: df = pd.DataFrame([['a', 'b'], ['c', 'd']],
index=['row 1', 'row 2'],
columns=['col 1', 'col 2'])
df.to_json(orient='table')
pd.read_json(_, orient='table')
...
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering. This issue should be renamed to |
@robmarkcole what version are you using? Your example ran fine for me on master. INSTALLED VERSIONScommit: 28e7a9498457e8cccc105a6261958197889325fa pandas: 0.23.0.dev0+487.g28e7a9498 |
@WillAyd error on |
General support for read_json with |
Hi!
|
@MichaMucha that's right this was never implemented, though not sure if it's valid JSON either. If interested investigation into the schema linked above to confirm or deny is welcome! |
Thanks for the quick reply! I read through the page and it seems that:
I would reason that you can have numerical column names. direct quotes from the spec (a "field" is a column in our case): Let me know what can I do if I can help more |
Thanks for the review! The table implementation is located in https://pandas.pydata.org/pandas-docs/stable/contributing.html Hope that helps but let me know if you have any questions |
Thank you! Sorry took me a while to find time. I dug a little deeper and it seems that this guy over here - One workaround I can think about is to match Another solution I can imagine is an error asking the user to provide string columns. Let me know what you think. |
@MichaMucha IIUC that supports the argument that numeric column names should not be allowed, given the column names are the keys in the JSON table schema and JSON keys need to be strings. If that's the case and you are looking to contribute then I'd suggest perhaps raising a more descriptive |
@WillAyd I think the problem is with the JSON serialization of the DataFrame (the 'data' for orient='table') and not with 'schema' (TableSchema spec). IIUC, the aim of orient='table' is to make round-trip JSON serialization-deserialization of pandas objects. As pandas DataFrame can have non-string column names (indeed, that is the default if column names are not passed explicitly at instantiation), then the column names SHOULD NOT be used as keys in the JSON object (JSON spec imposes that keys must be strings). It is also the case for index names: they can be non-strings. Therefore, index names should not be used as keys in the JSON object. |
#19129 should already cover that just needs community PR |
Yes, but I was suggesting not just "raising a more descriptive ValueError" (sic), but changing the implementation of the JSON serialization for orient='table'. |
take |
Hi, |
release |
Hi Isaac, |
I am not working on this issue and happy to no longer be assigned to this issue. I cannot comment on the importance/relevance/value of this issue at this time. |
got it. |
Code Sample, a copy-pastable example if possible
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: 16d0262
python: 3.6.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.0.dev0+79.g16d026212
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.0
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.1.1
openpyxl: 2.5.0b1
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.13
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: 0.1.2
fastparquet: 0.1.3
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: