-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix nested_to_record with None values in nested levels #21164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix nested_to_record with None values in nested levels #21164
Conversation
|
Hello @ssikdar1! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on June 07, 2018 at 12:16 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #21164 +/- ##
==========================================
+ Coverage 91.84% 91.84% +<.01%
==========================================
Files 153 153
Lines 49505 49506 +1
==========================================
+ Hits 45466 45467 +1
Misses 4039 4039
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change - you can add a whatsnew for v0.23.1
pandas/io/json/normalize.py
Outdated
@@ -80,6 +80,7 @@ def nested_to_record(ds, prefix="", sep=".", level=0): | |||
if level != 0: # so we skip copying for top level, common case | |||
v = new_d.pop(k) | |||
new_d[newkey] = v | |||
continue | |||
if v is None: # pop the key if the value is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps instead of continue this should just be elif
?
Can you check my latest commits? I changed it to not use a continue. Also changed the whatsnew as well. |
Codecov Report
@@ Coverage Diff @@
## master #21164 +/- ##
=======================================
Coverage 91.84% 91.84%
=======================================
Files 153 153
Lines 49505 49505
=======================================
Hits 45466 45466
Misses 4039 4039
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #21164 +/- ##
=======================================
Coverage 91.85% 91.85%
=======================================
Files 153 153
Lines 49555 49555
=======================================
Hits 45518 45518
Misses 4037 4037
Continue to review full report at Codecov.
|
pandas/io/json/normalize.py
Outdated
@@ -80,8 +80,9 @@ def nested_to_record(ds, prefix="", sep=".", level=0): | |||
if level != 0: # so we skip copying for top level, common case | |||
v = new_d.pop(k) | |||
new_d[newkey] = v | |||
if v is None: # pop the key if the value is None | |||
new_d.pop(k) | |||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use elif instead of a separate else and if
kk, changed to elif |
doc/source/whatsnew/v0.23.1.txt
Outdated
@@ -45,6 +45,7 @@ Bug Fixes | |||
~~~~~~~~~ | |||
|
|||
- tab completion on :class:`Index` in IPython no longer outputs deprecation warnings (:issue:`21125`) | |||
- fix normalize to not except when json_normalize is fed a key with null value on inner most key (:issue:`21158`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to IO section, use a ref to json_normalize
. see if you can describe this is more plain english.
Moved into the I/O section in whatsnew. Changed the message hopefully its clearer. |
doc/source/whatsnew/v0.23.1.txt
Outdated
@@ -81,7 +81,7 @@ Indexing | |||
I/O | |||
^^^ | |||
|
|||
- | |||
- bug in :meth:`nested_to_record` when :meth:`json_normalize` was called with certain nested jsons (:issue:`21158`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps better said "Bug in ... with None values in nested levels"
@@ -375,3 +375,27 @@ def test_nonetype_dropping(self): | |||
'info.last_updated': '26/05/2012'}] | |||
|
|||
assert result == expected | |||
|
|||
def test_nonetype_inner_most_level(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if we have a test case to handle None values that show up in the initial level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking that we had a test to cover None at the top-most level. I suppose it makes sense to test None at the deepest level (which you have) and also somewhere in the middle, to make sure it doesn't mess up subsequent un-nesting operations.
Can you at least add a test with None in one of the middle layers? You can probably simplify the example below (doesn't need to be a direct copy/paste from the issue) and use parametrization to re-use the logic for different inputs (i.e. one with None at the deepest level, one with None somewhere in the middle). Depending on how that looks maybe you want to even move the example with None at the top level here as well
These test case below have a json list with a key as None in the initial level of a json. data = [ [ |
doc/source/whatsnew/v0.23.1.txt
Outdated
@@ -81,7 +81,7 @@ Indexing | |||
I/O | |||
^^^ | |||
|
|||
- | |||
- bug in :meth:`nested_to_record` when :meth:`json_normalize` was called with None values in nested levels (:issue:`21158`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nested_to_record is not public, don't use it here
use double backticks around None
say nested levels in JSON
doc/source/whatsnew/v0.23.1.txt
Outdated
@@ -81,7 +81,7 @@ Indexing | |||
I/O | |||
^^^ | |||
|
|||
- | |||
- bug when :meth:`json_normalize` was called with `None` values in nested levels in JSON (:issue:`21158`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize "Bug" and as Jeff mentioned use double backticks around None, not just single
@@ -375,3 +375,27 @@ def test_nonetype_dropping(self): | |||
'info.last_updated': '26/05/2012'}] | |||
|
|||
assert result == expected | |||
|
|||
def test_nonetype_inner_most_level(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking that we had a test to cover None at the top-most level. I suppose it makes sense to test None at the deepest level (which you have) and also somewhere in the middle, to make sure it doesn't mess up subsequent un-nesting operations.
Can you at least add a test with None in one of the middle layers? You can probably simplify the example below (doesn't need to be a direct copy/paste from the issue) and use parametrization to re-use the logic for different inputs (i.e. one with None at the deepest level, one with None somewhere in the middle). Depending on how that looks maybe you want to even move the example with None at the top level here as well
Can you at least add a test with None in one of the middle layers?
Whatsnew: |
} | ||
result = nested_to_record(data) | ||
expected = { | ||
'location.id': None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was reading through the commentary of the issue and noticed there was some confusion on the topic, but I don't understand why we would want to drop the 'id': None
record here - is that solely driven by the elif
statement? If so, perhaps we don't need that at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I'm late replying to this, but I agree. I don't understand the intention of dropping the top level 'id': None
. I would much rather know explicitly that the value for a field is None than to be guessing or writing extra checks for whether the field exists.
So I took an example from here #20030
With the my current pull request as is it is, it prints:
Now commenting out the elif, it prints:
The issue here being the added column info. The change just didnt consider that a pop would occur twice in a row if the level > 0 and was also value was also None. If we get rid of the elif we would undo issue 20030 from that perspective? |
doc/source/whatsnew/v0.23.1.txt
Outdated
@@ -81,7 +81,7 @@ Indexing | |||
I/O | |||
^^^ | |||
|
|||
- | |||
- Bug when :meth:`json_normalize` was called with `None` values in nested levels in JSON (:issue:`21158`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this wont' render, you need pandas.io.json.json_normalize I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use double back ticks around None
# GH21158: If inner level json has a key with a null value | ||
# make sure it doesnt do a new_d.pop twice and except | ||
data = { | ||
"id": None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have the same result if id is NOT repeated (with None), just missing at various levels? e.g. try with a single id with None at the top and bottom levels (in another case, leave this one as well)
pls rebase as well |
f3b35a7
to
fa9ecd5
Compare
fa9ecd5
to
ab24d02
Compare
|
looks ok to me. @WillAyd pls merge when you are satisifed. |
@jreback somewhat conflicted on this. This PR certainly fixes the issue at hand, but what is the point of us dropping Could probably go forward with this for 0.23.1 as the bug fix but then revisit top level |
@WillAyd hmm, that's a good point. shouldn't |
As @ssikdar1 points out it does stem from #20030, though in reading through that and the subsequent PR I think it is confusing what was actually being addressed. The PR would imply that automatically dropping the top-level |
…on_normalize_KeyError_21158
Yep we can go with this for now. I'll open a separate issue for the top-level None parsing |
Thanks @ssikdar1 ! |
No problem! |
) (cherry picked from commit ab6aaf7)
(cherry picked from commit ab6aaf7)
git diff upstream/master -u -- "*.py" | flake8 --diff
Continue after pop so you dont pop with the same key twice in a row