BUG: Fix nested_to_record with None values in nested levels #21164

ssikdar1 · 2018-05-22T04:39:53Z

closes json_normalize gives KeyError in 0.23 #21158
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Continue after pop so you dont pop with the same key twice in a row

ssikdar1 · 2018-05-22T13:11:53Z

git diff upstream/master -u -- ".py" | flake8 --diff
$ git diff upstream/master -u -- ".py" | flake8 --diff
$

pep8speaks · 2018-05-22T13:30:58Z

Hello @ssikdar1! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 07, 2018 at 12:16 Hours UTC

codecov · 2018-05-22T13:31:09Z

Codecov Report

Merging #21164 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21164      +/-   ##
==========================================
+ Coverage   91.84%   91.84%   +<.01%     
==========================================
  Files         153      153              
  Lines       49505    49506       +1     
==========================================
+ Hits        45466    45467       +1     
  Misses       4039     4039

Flag	Coverage Δ
#multiple	`90.23% <100%> (ø)`	⬆️
#single	`41.88% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/json/normalize.py	`96.96% <100%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 172ab7a...34c745e. Read the comment docs.

WillAyd

Thanks for the change - you can add a whatsnew for v0.23.1

WillAyd · 2018-05-22T16:25:52Z

pandas/io/json/normalize.py

@@ -80,6 +80,7 @@ def nested_to_record(ds, prefix="", sep=".", level=0):
                if level != 0:  # so we skip copying for top level, common case
                    v = new_d.pop(k)
                    new_d[newkey] = v
+                    continue
                if v is None:  # pop the key if the value is None


Perhaps instead of continue this should just be elif?

ssikdar1 · 2018-05-23T01:15:57Z

Can you check my latest commits? I changed it to not use a continue.

Also changed the whatsnew as well.

codecov · 2018-05-23T02:06:23Z

Codecov Report

Merging #21164 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #21164   +/-   ##
=======================================
  Coverage   91.84%   91.84%           
=======================================
  Files         153      153           
  Lines       49505    49505           
=======================================
  Hits        45466    45466           
  Misses       4039     4039

Flag	Coverage Δ
#multiple	`90.23% <100%> (ø)`	⬆️
#single	`41.88% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/io/json/normalize.py	`96.93% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 172ab7a...34c745e. Read the comment docs.

codecov-io · 2018-05-23T02:07:34Z

Codecov Report

Merging #21164 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #21164   +/-   ##
=======================================
  Coverage   91.85%   91.85%           
=======================================
  Files         153      153           
  Lines       49555    49555           
=======================================
  Hits        45518    45518           
  Misses       4037     4037

Flag	Coverage Δ
#multiple	`90.25% <100%> (ø)`	⬆️
#single	`41.87% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/io/json/normalize.py	`96.93% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cea0a81...92a6263. Read the comment docs.

WillAyd · 2018-05-23T02:55:04Z

pandas/io/json/normalize.py

@@ -80,8 +80,9 @@ def nested_to_record(ds, prefix="", sep=".", level=0):
                if level != 0:  # so we skip copying for top level, common case
                    v = new_d.pop(k)
                    new_d[newkey] = v
-                if v is None:  # pop the key if the value is None
-                    new_d.pop(k)
+                else:


Use elif instead of a separate else and if

ssikdar1 · 2018-05-23T10:35:15Z

kk, changed to elif

jreback · 2018-05-23T10:45:47Z

doc/source/whatsnew/v0.23.1.txt

@@ -45,6 +45,7 @@ Bug Fixes
 ~~~~~~~~~

 - tab completion on :class:`Index` in IPython no longer outputs deprecation warnings (:issue:`21125`)
+- fix normalize to not except when json_normalize is fed a key with null value on inner most key (:issue:`21158`)


move to IO section, use a ref to json_normalize. see if you can describe this is more plain english.

ssikdar1 · 2018-05-24T02:40:58Z

Moved into the I/O section in whatsnew. Changed the message hopefully its clearer.

WillAyd · 2018-05-24T08:06:00Z

doc/source/whatsnew/v0.23.1.txt

@@ -81,7 +81,7 @@ Indexing
 I/O
 ^^^

-
+- bug in :meth:`nested_to_record` when :meth:`json_normalize` was called with certain nested jsons  (:issue:`21158`)


Perhaps better said "Bug in ... with None values in nested levels"

WillAyd · 2018-05-24T08:06:36Z

pandas/tests/io/json/test_normalize.py

@@ -375,3 +375,27 @@ def test_nonetype_dropping(self):
             'info.last_updated': '26/05/2012'}]

        assert result == expected
+
+    def test_nonetype_inner_most_level(self):


Can you check if we have a test case to handle None values that show up in the initial level?

Thanks for checking that we had a test to cover None at the top-most level. I suppose it makes sense to test None at the deepest level (which you have) and also somewhere in the middle, to make sure it doesn't mess up subsequent un-nesting operations.

Can you at least add a test with None in one of the middle layers? You can probably simplify the example below (doesn't need to be a direct copy/paste from the issue) and use parametrization to re-use the logic for different inputs (i.e. one with None at the deepest level, one with None somewhere in the middle). Depending on how that looks maybe you want to even move the example with None at the top level here as well

ssikdar1 · 2018-05-24T11:15:20Z

@WillAyd

Can you check if we have a test case to handle None values that show up in the initial level?

These test case below have a json list with a key as None in the initial level of a json.

https://github.com/pandas-dev/pandas/blob/master/pandas/tests/io/json/test_normalize.py#L354

data = [
{'info': None,
'author_name':
{'first': 'Smith', 'last_name': 'Appleseed'}
},
{'info':
{'created_at': '11/08/1993', 'last_updated': '26/05/2012'},
'author_name':
{'first': 'Jane', 'last_name': 'Doe'}
}
]

https://github.com/pandas-dev/pandas/blob/master/pandas/tests/io/json/test_normalize.py#L58

[
{'info': None},
{'info':
{'created ...

jreback · 2018-05-24T11:56:17Z

doc/source/whatsnew/v0.23.1.txt

@@ -81,7 +81,7 @@ Indexing
 I/O
 ^^^

-
+- bug in :meth:`nested_to_record` when :meth:`json_normalize` was called with None values in nested levels  (:issue:`21158`)


nested_to_record is not public, don't use it here

use double backticks around None

say nested levels in JSON

WillAyd · 2018-05-24T15:28:32Z

doc/source/whatsnew/v0.23.1.txt

@@ -81,7 +81,7 @@ Indexing
 I/O
 ^^^

-
+- bug when :meth:`json_normalize` was called with `None` values in nested levels in JSON  (:issue:`21158`)


Capitalize "Bug" and as Jeff mentioned use double backticks around None, not just single

WillAyd · 2018-05-24T15:31:02Z

pandas/tests/io/json/test_normalize.py

@@ -375,3 +375,27 @@ def test_nonetype_dropping(self):
             'info.last_updated': '26/05/2012'}]

        assert result == expected
+
+    def test_nonetype_inner_most_level(self):


Thanks for checking that we had a test to cover None at the top-most level. I suppose it makes sense to test None at the deepest level (which you have) and also somewhere in the middle, to make sure it doesn't mess up subsequent un-nesting operations.

Can you at least add a test with None in one of the middle layers? You can probably simplify the example below (doesn't need to be a direct copy/paste from the issue) and use parametrization to re-use the logic for different inputs (i.e. one with None at the deepest level, one with None somewhere in the middle). Depending on how that looks maybe you want to even move the example with None at the top level here as well

ssikdar1 · 2018-05-26T13:42:14Z

Can you at least add a test with None in one of the middle layers?

I changed the test to be a test with None at multiple levels.

+        data = {
 +            "id": None,
 +            "location": {
 +                "id": None,
 +                "country": {
 +                    "id": None,
 +                    "state": {
 +                        "id": None,
 +                        "town.info": {
 +                            "region": None,
 +                            "x": 49.151580810546875,
 +                            "y": -33.148521423339844,
 +                            "z": 27.572303771972656}}}
 +            }
 +        }

Whatsnew:
Capitalized the b in Bug and put None in ticks

WillAyd · 2018-05-26T23:22:31Z

pandas/tests/io/json/test_normalize.py

+        }
+        result = nested_to_record(data)
+        expected = {
+            'location.id': None,


I was reading through the commentary of the issue and noticed there was some confusion on the topic, but I don't understand why we would want to drop the 'id': None record here - is that solely driven by the elif statement? If so, perhaps we don't need that at all?

I know I'm late replying to this, but I agree. I don't understand the intention of dropping the top level 'id': None. I would much rather know explicitly that the value for a field is None than to be guessing or writing extra checks for whether the field exists.

ssikdar1 · 2018-05-28T12:25:11Z

@WillAyd

I was reading through the commentary of the issue and noticed there was some confusion on the topic, but I don't understand why we would want to drop the 'id': None record here - is that solely driven by the elif statement? If so, perhaps we don't need that at all?

So I took an example from here #20030

import pandas as pd
data_partial_fail = \
        [{'info': None, 
         'author_name': 
         {'first': 'Smith', 'last_name': 'Appleseed'}
        }, 
        {'info': 
         {'created_at': '11/08/1993', 'last_updated': '26/05/2012'},
        'author_name': 
         {'first': 'Jane', 'last_name': 'Doe'}
        }]
p = pd.io.json.json_normalize(data_partial_fail)
print(p.columns)

With the my current pull request as is it is, it prints:

Index(['author_name.first', 'author_name.last_name', 'info.created_at',
       'info.last_updated'],
      dtype='object')

Now commenting out the elif, it prints:

(test) shan@shan-ThinkPad-T530:~/test$ python foo.py 
Index(['author_name.first', 'author_name.last_name', 'info', 'info.created_at',
       'info.last_updated'],
      dtype='object')

The issue here being the added column info.
The bug issue fix to 20030 added the original change in an attempt to get rid of the of the extra info column:
https://github.com/pandas-dev/pandas/pull/20399/files#diff-9c654764f5f21c8e9d58d9ebf14de86dR83

The change just didnt consider that a pop would occur twice in a row if the level > 0 and was also value was also None.

If we get rid of the elif we would undo issue 20030 from that perspective?

jreback · 2018-05-29T00:43:44Z

doc/source/whatsnew/v0.23.1.txt

@@ -81,7 +81,7 @@ Indexing
 I/O
 ^^^

-
+- Bug when :meth:`json_normalize` was called with `None` values in nested levels in JSON  (:issue:`21158`)


this wont' render, you need pandas.io.json.json_normalize I think

use double back ticks around None

jreback · 2018-05-29T00:45:16Z

pandas/tests/io/json/test_normalize.py

+        # GH21158: If inner level json has a key with a null value
+        # make sure it doesnt do a new_d.pop twice and except
+        data = {
+            "id": None,


does this have the same result if id is NOT repeated (with None), just missing at various levels? e.g. try with a single id with None at the top and bottom levels (in another case, leave this one as well)

jreback · 2018-05-29T00:45:36Z

pls rebase as well

ssikdar1 · 2018-05-29T11:53:00Z

Whatsnew changed to have pandas.io.json.json_normalize and None
another testcase added to test the id: None at various levels
rebased w/ master

jreback · 2018-06-04T21:33:03Z

looks ok to me. @WillAyd pls merge when you are satisifed.

WillAyd · 2018-06-04T21:45:31Z

@jreback somewhat conflicted on this. This PR certainly fixes the issue at hand, but what is the point of us dropping None values at the top level of the JSON yet maintaining it at deeper levels? Seems like it would be more consistent from an end user perspective to maintain at all levels and leave it to them to explicitly drop if they wanted.

Could probably go forward with this for 0.23.1 as the bug fix but then revisit top level None-handling for 0.24.0?

jreback · 2018-06-04T21:52:38Z

@WillAyd hmm, that's a good point. shouldn't id be in the expected? why are we dropping it at all? yeah that sees wrong

WillAyd · 2018-06-04T22:03:23Z

As @ssikdar1 points out it does stem from #20030, though in reading through that and the subsequent PR I think it is confusing what was actually being addressed. The PR would imply that automatically dropping the top-level None value is a feature, though I would argue that's not desirable and is in fact a bug.

jreback · 2018-06-07T11:02:27Z

@ssikdar1 can you rebase

@WillAyd ok with this?

…on_normalize_KeyError_21158

WillAyd · 2018-06-07T15:58:01Z

Yep we can go with this for now. I'll open a separate issue for the top-level None parsing

WillAyd · 2018-06-07T15:59:00Z

Thanks @ssikdar1 !

ssikdar1 · 2018-06-07T15:59:51Z

No problem!

) (cherry picked from commit ab6aaf7)

(cherry picked from commit ab6aaf7)

)

WillAyd requested changes May 22, 2018

View reviewed changes

WillAyd added the IO JSON read_json, to_json, json_normalize label May 22, 2018

WillAyd added this to the 0.23.1 milestone May 22, 2018

WillAyd requested changes May 23, 2018

View reviewed changes

jreback requested changes May 23, 2018

View reviewed changes

WillAyd requested changes May 24, 2018

View reviewed changes

WillAyd changed the title ~~Change nested_to_record to have a continue~~ BUG: Fix nested_to_record with None values in nested levels May 24, 2018

jreback requested changes May 24, 2018

View reviewed changes

WillAyd requested changes May 24, 2018

View reviewed changes

WillAyd reviewed May 26, 2018

View reviewed changes

jreback requested changes May 29, 2018

View reviewed changes

ssikdar1 force-pushed the json_normalize_KeyError_21158 branch from f3b35a7 to fa9ecd5 Compare May 29, 2018 03:01

Fix json_normalize to not except on certain inputs (pandas-dev#21164)

ab24d02

ssikdar1 force-pushed the json_normalize_KeyError_21158 branch from fa9ecd5 to ab24d02 Compare May 29, 2018 03:09

jreback approved these changes Jun 4, 2018

View reviewed changes

jreback added the Needs Backport label Jun 4, 2018

Merge branch 'master' of https://github.com/pandas-dev/pandas into js…

92a6263

…on_normalize_KeyError_21158

WillAyd merged commit ab6aaf7 into pandas-dev:master Jun 7, 2018

ssikdar1 deleted the json_normalize_KeyError_21158 branch June 7, 2018 15:59

This was referenced Jun 7, 2018

JSON nested_to_record Silently Drops Top-Level None Values #21356

Closed

Inexplicable KeyError in pd.io.json.json_normalize(dic) #21352

Closed

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Jun 12, 2018

Fix nested_to_record with None values in nested levels (pandas-dev#21164

5359aea

) (cherry picked from commit ab6aaf7)

TomAugspurger pushed a commit that referenced this pull request Jun 12, 2018

Fix nested_to_record with None values in nested levels (#21164)

3723e80

(cherry picked from commit ab6aaf7)

TomAugspurger removed the Needs Backport label Jun 12, 2018

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018

Fix nested_to_record with None values in nested levels (pandas-dev#21164

ae0c50f

)

Uh oh!

BUG: Fix nested_to_record with None values in nested levels #21164

BUG: Fix nested_to_record with None values in nested levels #21164

Uh oh!

Conversation

ssikdar1 commented May 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssikdar1 commented May 22, 2018

Uh oh!

pep8speaks commented May 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on June 07, 2018 at 12:16 Hours UTC

Uh oh!

codecov bot commented May 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ssikdar1 commented May 23, 2018

Uh oh!

codecov bot commented May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov-io commented May 23, 2018 • edited by codecov bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ssikdar1 commented May 23, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ssikdar1 commented May 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ssikdar1 commented May 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ssikdar1 commented May 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ssikdar1 commented May 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented May 29, 2018

Uh oh!

ssikdar1 commented May 29, 2018

Uh oh!

jreback commented Jun 4, 2018

Uh oh!

WillAyd commented Jun 4, 2018

Uh oh!

jreback commented Jun 4, 2018

ssikdar1 commented May 22, 2018 •

edited

Loading

pep8speaks commented May 22, 2018 •

edited

Loading

codecov bot commented May 22, 2018 •

edited

Loading

codecov bot commented May 23, 2018 •

edited

Loading

codecov-io commented May 23, 2018 •

edited by codecov bot

Loading

ssikdar1 commented May 26, 2018 •

edited

Loading

ssikdar1 commented May 28, 2018 •

edited

Loading