Skip to content

build_table_schema has hardcoded Pandas version (0.20.0) #28455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
slazicoicr opened this issue Sep 16, 2019 · 5 comments · Fixed by #45074
Closed

build_table_schema has hardcoded Pandas version (0.20.0) #28455

slazicoicr opened this issue Sep 16, 2019 · 5 comments · Fixed by #45074
Labels
Docs IO JSON read_json, to_json, json_normalize Needs Discussion Requires discussion from core team before further action
Milestone

Comments

@slazicoicr
Copy link

Code Sample, a copy-pastable example if possible

import pandas
from pandas.io.json import build_table_schema
df = pandas.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
build_table_schema(df)

produces

{'fields': [{'name': 'index', 'type': 'integer'},
  {'name': 'col1', 'type': 'integer'},
  {'name': 'col2', 'type': 'integer'}],
 'primaryKey': ['index'],
 'pandas_version': '0.20.0'}

Problem description

The version 0.20.0 is hard coded in the function.

Expected Output

pandas.__version__ returns '0.25.0+349.g9dc6de3.dirty', which should also be returned by build_table_schema

Output of pd.show_versions()

INSTALLED VERSIONS

commit : ac6bbaf
python : 3.6.7.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-55-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_CA.UTF-8
LOCALE : en_CA.UTF-8

pandas : 0.25.0+350.gac6bbaf
numpy : 1.17.2
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.2.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.8.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : 2.7.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : 3.5.1
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None

@TomAugspurger
Copy link
Contributor

This is deliberate. That's the pandas version the schema was last revised. I think we have an open issue about better documenting this.

@slazicoicr
Copy link
Author

Ok. The current documentation Whether to include a field pandas_version with the version of pandas that generated the schema does read to me that pandas_version should reflect the pandas version the user is running. Please close this ticket/reject Pull Request if this is an issue with documentation.

@WillAyd
Copy link
Member

WillAyd commented Sep 16, 2019

#26637 is probably the one being referred to. Would we consider deprecating / removing this altogether though? I personally also find it confusing without a lot of benefit

@slazicoicr
Copy link
Author

I personally like versioning schemas (the reason I came across build_table_schema was because I want to expose a promise about what the DataFrame will look like). If the format of this promise changes without me being able to detect it, things will break. I think the versioning can be kept by:

  • Change pandas_version to schema_version

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 16, 2019

I'm not entirely sure how to deprecate it. In theory, we can

  1. Add an api_version / schema_version field, that contains what pandas_version currently contains.
  2. Bump the schema version currently in pandas_version.
  3. State that sometime in the future, pandas_version will be removed.

Not sure if that's worth the churn on people currently reading them. Perhaps reach out to the people involved in earlier issues to get their thoughts.

@mroeschke mroeschke added the IO JSON read_json, to_json, json_normalize label Nov 3, 2019
@mroeschke mroeschke added Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action labels May 8, 2020
@mroeschke mroeschke added Docs and removed Deprecate Functionality to remove in pandas labels Dec 26, 2021
@jreback jreback added this to the 1.4 milestone Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO JSON read_json, to_json, json_normalize Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants