-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: json_normalize() avoid loss of precision for int64 with missing values #16918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jzwinck : Sounds pretty reasonable, given that it is consistent with our API for other functions. I would suggest you try implementing the |
@gfyoung Thanks. I just realized that this also behaves badly:
Can you suggest a way to make that work? I would leverage that solution to improve If there is no good way to do this with the simple case of constructing a DataFrame, then we could either implement one, or we could make |
type conversions from json are not unexpected; it is not a typed format. |
This is a somewhat unusual way of initializing a
What are your thoughts regarding this proposal to add |
@gfyoung I made it work well enough for my purposes by adding
This simple solution has three significant problems:
It's unlikely that I'll contribute a patch for this one, but I do think it is important to address. |
@jzwinck : Okay, sounds good! This work will be useful for anyone who takes up this issue later. |
Fixed by #27335. |
This code:
Gives
21
, when reasonable users might expect it to give0
.This inaccuracy occurs when one field has an int64 value that is not present in all records, triggering a conversion of the Series dtype to float64. Of course, Pandas does this conversion so that it can put NAN where no value exists.
One solution could be to add a
fill_value
parameter, as seen inadd()
,unstack()
, and other Pandas functions. It would be good for this to support a dict as well as a single value, in case different fill values are required for different columns.The usage might be like this:
Then instead of the current result:
The result would be:
I'm using Pandas 0.20.1.
The text was updated successfully, but these errors were encountered: