BUG: non-iterable value in meta raise error in json_normalize #31524

charlesdong1991 · 2020-01-31T23:14:26Z

closes json_normalize in 1.0.0 with meta path specified - expects iterable #31507
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2020-01-31T23:14:30Z

Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-11 07:54:57 UTC

WillAyd · 2020-01-31T23:17:06Z

Can you add a whatsnew for v1.0.1?

charlesdong1991 · 2020-01-31T23:19:40Z

oops, i was editing whatsnew just now 😅

charlesdong1991 · 2020-01-31T23:36:16Z

emm,

self = <pandas.tests.io.json.test_normalize.TestNestedToRecord object at 0x14ab9d208>

    def test_meta_non_iterable(self):
        # GH 31507
        data = """[{"id": 99, "data": [{"one": 1, "two": 2}]}]"""
    
        result = json_normalize(json.loads(data), record_path=["data"], meta=["id"])
        expected_values = [[1, 2, "99"]]
        columns = ["one", "two", "id"]
        expected = DataFrame(expected_values, columns=columns)
>       tm.assert_frame_equal(result, expected)

pandas/tests/io/json/test_normalize.py:761: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/_libs/testing.pyx:65: in pandas._libs.testing.assert_almost_equal
    cpdef assert_almost_equal(a, b,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   raise_assert_detail(obj, msg, lobj, robj)
E   AssertionError: DataFrame.iloc[:, 2] (column name="id") are different
E   
E   DataFrame.iloc[:, 2] (column name="id") values are different (100.0 %)
E   [left]:  [99]
E   [right]: [99]

is there a bug in assert_frame_equal? 🤔 will check it out tomorrow

WillAyd · 2020-01-31T23:57:20Z

pandas/tests/io/json/test_normalize.py

+        data = """[{"id": 99, "data": [{"one": 1, "two": 2}]}]"""
+
+        result = json_normalize(json.loads(data), record_path=["data"], meta=["id"])
+        expected_values = [[1, 2, "99"]]


The expected value here should be 99 not "99" (probably cause of CI error)

jreback · 2020-02-01T22:39:39Z

pandas/tests/io/json/test_normalize.py

@@ -749,3 +749,13 @@ def test_series_non_zero_index(self):
            }
        )
        tm.assert_frame_equal(result, expected)
+
+    def test_meta_non_iterable(self):


can you move this near the other test

moved! and wait for @WillAyd comment on the other one

jreback · 2020-02-01T23:08:04Z

pandas/io/json/_normalize.py

        result = js  # type: ignore
        if isinstance(spec, list):
            for field in spec:
                result = result[field]
        else:
            result = result[spec]

-        if not isinstance(result, Iterable):
+        # GH 31507 iterable limit should only be used on record, not meta


this is a code smell to add this. so it seems that result can be null, iterabe, or a scalar? if its a scalar what is the return value here?

you mean [99]? then will still return 99

i think this Iterable should only restrict if specifying record_path, but not for meta? Am I right about the behaviour here @WillAyd ?

I do agree that this is getting a little strange, especially since there is inspection of meta on line 273 of the same module and we are essentially repeating that here with a boolean indicator being manually supplied

@charlesdong1991 do you see a way to more logically order this function so we don't have to use this bool indicator?

thanks for your reply! @WillAyd

I will think about it a bit, i feel the patch for that Iterable in this _pull_field has code smell a bit because it is used for two cases which have different requirements

especially since there is inspection of meta on line 273 of the same module and we are essentially repeating that here with a boolean indicator being manually supplied

i just take a look again at the current codebase, seems that line 273 and onwards you referred is to validate/transform the key which is assigned to meta not the value the key associated with, in this case, key is id (and we could either specify it as [id] or id or more complex cases with nested list, and this part of code will deal with it), however, the code in _pull_field is to pull the value out, and in this case, the value is 99, and for record_path the value should be an Iterable, while for meta it does not necessarily be the case, and therefore I think the patch added to check the type should only work for values of which the element of record_path point to, not for meta, maybe I miss some functionalities of either of them? @WillAyd @jreback

@charlesdong1991 does splitting this into separate functions for record_path vs meta and just using this as a base for those functions make things cleaner?

charlesdong1991 · 2020-02-11T19:59:37Z

@WillAyd @jreback I slightly changed the change made in #30145 and think this should be the cleaner solution to fix this bug, please take a look and any feedback is very appreciated!

charlesdong1991 · 2020-03-04T21:45:12Z

many thanks for your quick response @WillAyd

I will leave this PR then, and feel free to still take a look and see if anything else you want to see improvements. I will rebase once the fix PR is merged

charlesdong1991 · 2020-03-06T18:47:07Z

any further feedbacks? @WillAyd

I think this would be nice to be included in 1.0.2 as it seems impact many users based on the responds under the Issue

WillAyd · 2020-03-10T15:41:16Z

pandas/io/json/_normalize.py

-        result = js  # type: ignore
+    def _pull_field(js: Dict[str, Scalar], spec: Union[List, str]) -> Scalar:
+        """Internal function to pull field"""
+        result = js


Why did this need to change?

emm, this is because i moved type: ignore to result = result[field] below, because without type: ignore, this result = result[field] will raise an error in type annotation which complains the incompatible assignment. And after moving it below, this one does not seem needed.

@WillAyd any thoughts on this?

TomAugspurger

LGTM, other than a question about the release note.

charlesdong1991 · 2020-03-10T20:51:28Z

@TomAugspurger thanks for your quick feedback, however, maybe something went wrong with github, your question about release note is not presented, could you please rewrite your question? thanks!

TomAugspurger · 2020-03-10T20:56:42Z

Strange. Is this fixing a regression? If so, the note should be towards the top of 1.0.2.rst with the other regression fixes.

charlesdong1991 · 2020-03-10T21:00:45Z

thanks! @TomAugspurger
this is not a regression issue fix IMHO, and maybe it is better to keep it in I/O section?

TomAugspurger · 2020-03-10T21:09:48Z

Yep, that sounds good.

…

On Tue, Mar 10, 2020 at 4:00 PM Kaiqi Dong ***@***.***> wrote: thanks! @TomAugspurger <https://github.com/TomAugspurger> this is not a regression issue IMHO, and shall I keep it in I/O section? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#31524?email_source=notifications&email_token=AAKAOIU2NMFUWM5SKIEJVOLRG2TAXA5CNFSM4KOO44P2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEONEGEI#issuecomment-597312273>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOIQGSU6DDXCA7YS27X3RG2TAXANCNFSM4KOO44PQ> .

jreback

lgtm

WillAyd · 2020-03-11T00:12:08Z

pandas/io/json/_normalize.py

        result = js  # type: ignore
        if isinstance(spec, list):
            for field in spec:
-                result = result[field]
+                result = result[field]  # type: ignore


Can you advise specifically what the error is? I realize we want to get this in for 1.0.2 so not going to block, but I still think this code is suspect (not from your change per se - just a historical artifact) so I'd hate to suppress a warning about another bug that this fix could be introducing

Took a look at this locally; I think if you revert some of the other changes here you won't need the ignore. Suggested separately

yeah, the error is:

error: Incompatible types in assignment (expression has type "Union[str, int, float, bool, Any, Any, Any, Any]", variable has type "Dict[str, Union[str, int, float, bool, Any, Any, Any, Any]]")

as said, this is because result = result[field], and result is a Dict while the value of result is not, so there is a conflict. And having Any instead of Scalar could fix the issue

WillAyd · 2020-03-11T03:02:43Z

pandas/io/json/_normalize.py

        result = js  # type: ignore
        if isinstance(spec, list):
            for field in spec:
-                result = result[field]
+                result = result[field]  # type: ignore


Took a look at this locally; I think if you revert some of the other changes here you won't need the ignore. Suggested separately

WillAyd · 2020-03-11T03:03:21Z

pandas/io/json/_normalize.py

@@ -226,14 +227,26 @@ def _json_normalize(
    Returns normalized data with columns prefixed with the given string.
    """

-    def _pull_field(js: Dict[str, Any], spec: Union[List, str]) -> Iterable:
+    def _pull_field(js: Dict[str, Scalar], spec: Union[List, str]) -> Scalar:


Suggested change

def _pull_field(js: Dict[str, Scalar], spec: Union[List, str]) -> Scalar:

def _pull_field(js: Dict[str, Any], spec: Union[List, str]) -> Union[Scalar, Iterable]:

WillAyd · 2020-03-11T03:03:33Z

pandas/io/json/_normalize.py

        result = js  # type: ignore
        if isinstance(spec, list):
            for field in spec:
-                result = result[field]
+                result = result[field]  # type: ignore


Suggested change

result = result[field] # type: ignore

result = result[field]

WillAyd · 2020-03-11T03:03:41Z

pandas/io/json/_normalize.py

        else:
-            result = result[spec]
+            result = result[spec]  # type: ignore


Suggested change

result = result[spec] # type: ignore

result = result[spec]

WillAyd · 2020-03-11T03:03:48Z

pandas/io/json/_normalize.py

+            result = result[spec]  # type: ignore
+        return result
+
+    def _pull_records(js: Dict[str, Scalar], spec: Union[List, str]) -> Iterable:


Suggested change

def _pull_records(js: Dict[str, Scalar], spec: Union[List, str]) -> Iterable:

def _pull_records(js: Dict[str, Any], spec: Union[List, str]) -> Iterable:

charlesdong1991 · 2020-03-11T08:13:49Z

Sorry about the mess I made on the type annotation and function naming @WillAyd

I have committed a fix following ur suggestion, pls let me know your thoughts, and I will try to update PR asap to get this done before release.

TomAugspurger · 2020-03-11T14:23:14Z

CI is passing, so I'm planning to merge this in an hour or so if there aren't any objections. We can fixup annotations later if needed.

WillAyd

lgtm. Appears the annotations were already cleaned up

WillAyd · 2020-03-11T15:02:43Z

Thanks @charlesdong1991 . Great job seeing this through

…rror in json_normalize

…on_normalize (#32629) Co-authored-by: Kaiqi Dong <[email protected]>

…-dev#31524)

charlesdong1991 added 9 commits December 3, 2018 17:43

remove \n from docstring

7e461a1

fix conflicts

1314059

Merge remote-tracking branch 'upstream/master'

8bcb313

Merge remote-tracking branch 'upstream/master'

24c3ede

fix issue 17038

dea38f2

revert change

cd9e7ac

revert change

e5e912b

Merge remote-tracking branch 'upstream/master' into fix_issue_31507

2d21d1e

fix uo

fcb4b80

pep8

8ec4450

whatsnew

a33d05c

WillAyd requested changes Jan 31, 2020

View reviewed changes

charlesdong1991 added 2 commits February 1, 2020 09:11

fix up

6bedc52

fixup

1f0f3bc

jreback requested changes Feb 1, 2020

View reviewed changes

jreback added Bug IO JSON read_json, to_json, json_normalize labels Feb 1, 2020

charlesdong1991 added 4 commits February 2, 2020 20:22

Merge remote-tracking branch 'upstream/master' into fix_issue_31507

ce81951

move around

5de348c

better python

3c38c48

fix conflict

130d71b

charlesdong1991 requested review from WillAyd and jreback February 11, 2020 19:59

charlesdong1991 added 2 commits February 11, 2020 21:16

fixup

3ef920f

fixup

0b46239

change back to scalar

011dbb0

Merge remote-tracking branch 'upstream/master' into fix_issue_31507

3e74a3a

charlesdong1991 requested a review from WillAyd March 5, 2020 21:27

WillAyd reviewed Mar 10, 2020

View reviewed changes

This was referenced Mar 10, 2020

json_normalize errors if meta fields are integer #32480

Closed

RLS: 1.0.2 #32415

Closed

charlesdong1991 added 2 commits March 10, 2020 21:33

fixup

9476af7

Merge remote-tracking branch 'upstream/master' into fix_issue_31507

c399983

TomAugspurger approved these changes Mar 10, 2020

View reviewed changes

add ignore type

6165467

jreback approved these changes Mar 10, 2020

View reviewed changes

WillAyd requested changes Mar 11, 2020

View reviewed changes

jreback added this to the 1.0.2 milestone Mar 11, 2020

fix annotation

7a20b8c

WillAyd approved these changes Mar 11, 2020

View reviewed changes

WillAyd merged commit 983fae6 into pandas-dev:master Mar 11, 2020

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Mar 11, 2020

Backport PR pandas-dev#31524: BUG: non-iterable value in meta raise e…

d5d6430

…rror in json_normalize

meeseeksmachine mentioned this pull request Mar 11, 2020

Backport PR #31524 on branch 1.0.x (BUG: non-iterable value in meta raise error in json_normalize) #32629

Merged

simonjayhawkins pushed a commit that referenced this pull request Mar 11, 2020

Backport PR #31524: BUG: non-iterable value in meta raise error in js…

f770958

…on_normalize (#32629) Co-authored-by: Kaiqi Dong <[email protected]>

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020

BUG: non-iterable value in meta raise error in json_normalize (pandas…

c33ee73

…-dev#31524)

	def _pull_field(js: Dict[str, Scalar], spec: Union[List, str]) -> Scalar:
	def _pull_field(js: Dict[str, Any], spec: Union[List, str]) -> Union[Scalar, Iterable]:

	def _pull_records(js: Dict[str, Scalar], spec: Union[List, str]) -> Iterable:
	def _pull_records(js: Dict[str, Any], spec: Union[List, str]) -> Iterable:

Uh oh!

BUG: non-iterable value in meta raise error in json_normalize #31524

BUG: non-iterable value in meta raise error in json_normalize #31524

Uh oh!

Conversation

charlesdong1991 commented Jan 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Jan 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-03-11 07:54:57 UTC

Uh oh!

WillAyd commented Jan 31, 2020

Uh oh!

charlesdong1991 commented Jan 31, 2020

Uh oh!

charlesdong1991 commented Jan 31, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 Feb 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 commented Feb 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charlesdong1991 commented Mar 4, 2020

Uh oh!

charlesdong1991 commented Mar 6, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 Mar 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 commented Mar 10, 2020

Uh oh!

TomAugspurger commented Mar 10, 2020

Uh oh!

charlesdong1991 commented Mar 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Mar 10, 2020 via email

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

WillAyd Mar 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 Mar 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 commented Jan 31, 2020 •

edited

Loading

pep8speaks commented Jan 31, 2020 •

edited

Loading

charlesdong1991 Feb 4, 2020 •

edited

Loading

charlesdong1991 commented Feb 11, 2020 •

edited

Loading

charlesdong1991 Mar 10, 2020 •

edited

Loading

charlesdong1991 commented Mar 10, 2020 •

edited

Loading

WillAyd Mar 11, 2020 •

edited

Loading

charlesdong1991 Mar 11, 2020 •

edited

Loading