Skip to content

pd.read_json May Not Maintain Numeric String Index #28556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WillAyd opened this issue Sep 21, 2019 · 7 comments · Fixed by #38727
Closed

pd.read_json May Not Maintain Numeric String Index #28556

WillAyd opened this issue Sep 21, 2019 · 7 comments · Fixed by #38727
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions IO JSON read_json, to_json, json_normalize
Milestone

Comments

@WillAyd
Copy link
Member

WillAyd commented Sep 21, 2019

>>> df = pd.DataFrame(range(3), index=list("123"))
>>> df.to_json(orient="split")
'{"columns":[0],"index":["1","2","3"],"data":[[0],[1],[2]]}'
>>> pd.read_json(df.to_json(orient="split"), orient="split").index
Int64Index([1, 2, 3], dtype='int64')

Note that the string nature of the values should be preserved via roundtrip here, but ends up being lossy anyway. Noted during refactor of #28510

@WillAyd WillAyd added Dtype Conversions Unexpected or buggy dtype conversions IO JSON read_json, to_json, json_normalize labels Sep 21, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Sep 21, 2019
@jmg7173
Copy link
Contributor

jmg7173 commented Sep 21, 2019

Can I work on this issue?

@jmg7173
Copy link
Contributor

jmg7173 commented Sep 22, 2019

The reason why pd.read_json converts dtype of object index into int64 is as belows.

if data.dtype == "object":
# try float
try:
data = data.astype("float64")
result = True
except (TypeError, ValueError):
pass

Do we need to convert object typed data as coarsed type like float, int64 and so on?

If I comment out this line, it breaks tests so much.

@jmg7173
Copy link
Contributor

jmg7173 commented Sep 24, 2019

How about change default value of convert_axes as False at this condition of read_json?

if convert_axes is None and orient != "table":
convert_axes = True

@sathyz
Copy link
Contributor

sathyz commented Mar 22, 2020

@jmg7173 are you still looking at this?

@scratchmex
Copy link

I'm sorry but, why the strings values should be preserved by default? It isn't more comfortable to parse string numbers as integers?

@jmg7173
Copy link
Contributor

jmg7173 commented Mar 31, 2020

@sathyz No, you can take this issue.

@theoniko
Copy link
Contributor

theoniko commented Dec 29, 2020

Hello @WillAyd,
Would it be possible to review my pr and give me feedback?
Thanks in advace.

@jreback jreback modified the milestones: Contributions Welcome, 1.3 Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants