-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Json fill_value for missing fields #27073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
798da62
to
7c2b71c
Compare
Codecov Report
@@ Coverage Diff @@
## master #27073 +/- ##
==========================================
- Coverage 92.04% 92.03% -0.01%
==========================================
Files 180 180
Lines 50714 50716 +2
==========================================
- Hits 46679 46676 -3
- Misses 4035 4040 +5
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #27073 +/- ##
==========================================
- Coverage 92.04% 92.03% -0.01%
==========================================
Files 180 180
Lines 50714 50716 +2
==========================================
- Hits 46679 46676 -3
- Misses 4035 4040 +5
Continue to review full report at Codecov.
|
7c2b71c
to
41baaa3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read through the original issue so it seems like this is a way around conversions from int to float with missing data. A lot has changed since this came up and we now have a nullable integer type, so I'm not sure we actually want to add this parameter to the API or if we just need to figure out how to return an Int64 here
cc @gfyoung
|
||
k = len(columns) | ||
n = len(dicts) | ||
|
||
result = np.empty((n, k), dtype='O') | ||
if fill_value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than do this as a (n x k) array can't you just assign the appropriate value down on line 336 below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it can be done that way. I wasn't very sure about how good dictionary value retrieval compares to index retrieval from a list. I was afraid that (k x n) times of dictionary access is much slower than k x n times of index retrieval from a list. Also, this is a (1 x k) array instead of (n x k) array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not in favor of adding this keyword because we already have .fillna()
once this is called; or are you suggesting something that cannot already be done? note - all in favor of solving the referenced issue though
@jreback : By the time
|
@jiangyue12392 : Sorry for the silence! We're actually in the midst of pushing a release candidate for @WillAyd @jreback : If either of you have bandwidth to give feedback on the two directions I gave above, that would be great here. |
Yea I don’t think adding this to the API is the best approach. Certainly more complicated but would rather see if there’s a way to return Int64 with missing data
…Sent from my iPhone
On Jul 3, 2019, at 11:31 PM, gfyoung ***@***.***> wrote:
@jiangyue12392 : Sorry for the silence! We're actually in the midst of pushing a release candidate for 0.25.0. I would merge / rebase master for the time being.
@WillAyd @jreback : If either of you have bandwidth to give feedback on the two directions I gave above, that would be great here.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@WillAyd Can you point me to more information about the nullable integer type? I may be able to come up with some other solutions with that. |
@jiangyue12392 there is a dedicated section in the docs to it: https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html It's a relatively new feature still so anything you can do to improve integration with that would be awesome! |
superseded by #27335 |
git diff upstream/master -u -- "*.py" | flake8 --diff
A new argument
fill_value
is added to allow users to supply default values for missing fields for each column. This enhancement also addresses the undesirable type conversion fromint
tofloat
due to missing fields as raised in #16918