Skip to content

Improve docs for pandas_version in Dataframe to_json(orient='table') #26637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shsnyder opened this issue Jun 3, 2019 · 7 comments · Fixed by #37025
Closed

Improve docs for pandas_version in Dataframe to_json(orient='table') #26637

shsnyder opened this issue Jun 3, 2019 · 7 comments · Fixed by #37025
Labels
Docs good first issue IO JSON read_json, to_json, json_normalize
Milestone

Comments

@shsnyder
Copy link

shsnyder commented Jun 3, 2019

Code Sample, a copy-pastable example if possible

# Your code here
import pandas as pd
df = pd.DataFrame({'A': [1,2,3], 'B': [4,5,6], 'C': [7,8,9] })
df.to_json(orient='table', index='False')

Problem description

I was attempting to serialize a dataframe and omit the indexes.
In the generated JSON the indexes remained and the pandas_version value in the JSON was "0.20.0"

The results I obtained on Windows 10 and MacOS 10.14.5 was:
{"schema": {"fields":[{"name":"values","type":"integer"},{"name":"A","type":"integer"},{"name":"B","type":"integer"},{"name":"C","type":"integer"}],"primaryKey":[null],"pandas_version":"0.20.0"}, "data": [{"index":0,"A":1,"B":4,"C":7},{"index":1,"A":2,"B":5,"C":8},{"index":2,"A":3,"B":6,"C":9}]}'

[this should explain why the current behaviour is a problem and why the expected output is a better solution.]

Note: We receive a lot of issues on our GitHub tracker, so it is very possible that your issue has been posted before. Please check first before submitting so that we do not have to handle and close duplicates!

Note: Many problems can be resolved by simply upgrading pandas to the latest version. Before submitting, please check if that solution works for you. If possible, you may want to check if master addresses this issue, but that is not necessary.

For documentation-related issues, you can check the latest versions of the docs on master here:

https://pandas-docs.github.io/pandas-docs-travis/

If the issue has not been resolved there, go ahead and file it in the issue tracker.

Expected Output

{"schema": {"fields":[{"name":"values","type":"integer"},{"name":"A","type":"integer"},{"name":"B","type":"integer"},{"name":"C","type":"integer"}],"primaryKey":[null],"pandas_version":"0.24.2"}, "data": [{"A":1,"B":4,"C":7},{"A":2,"B":5,"C":8},{"A":3,"B":6,"C":9}]}'

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.6
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 1.8.5
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.5
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@shsnyder
Copy link
Author

shsnyder commented Jun 3, 2019

I now realize that the index option should be a Boolean and not a string. Changing it to a Boolean does remove the index values but the panda_version is still "0.20.0".

@TomAugspurger
Copy link
Contributor

This is deliberate. See the discussion at #24509 (comment).

There are a few places in the docs (e.g. http://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.io.json.build_table_schema.html#pandas.io.json.build_table_schema) that could use clarification if you're interested.

@WillAyd WillAyd changed the title Dataframe to_json(orient='table') always returns "pandas_version":"0.20.0" Improve docs for pandas_version in Dataframe to_json(orient='table') Jun 3, 2019
@WillAyd WillAyd added Docs good first issue IO JSON read_json, to_json, json_normalize labels Jun 3, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Jun 3, 2019
@davefol
Copy link

davefol commented Jul 29, 2019

I'm willing to update the docs with clarification.

@sameshl
Copy link
Contributor

sameshl commented Aug 11, 2019

@TomAugspurger I would love to take this up. Could you explain what I would need to do exactly? Because it the docs it seems clear that the index parameter is a bool right?

@TomAugspurger
Copy link
Contributor

Yeah. I think the confusion was about pandas_version. That’s supposed to represent the version of the schema. When we change the schemes, we’ll bump the version.

@ShaharNaveh
Copy link
Member

Does #31472 closes this issue?

@tnwei
Copy link
Contributor

tnwei commented Oct 10, 2020

@MomIsBestFriend I don't think so, since the linked PR does not clarify the meaning of pandas_version. Linking a separate PR specific for this open issue.

@jreback jreback modified the milestones: Contributions Welcome, 1.2 Oct 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs good first issue IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants