Skip to content

BUG: closes #44312: fixes unwanted TypeError with nullable nested metadata in json_normalize #44325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

qdbp
Copy link
Contributor

@qdbp qdbp commented Nov 5, 2021

Fixes TypeError crash in json_normalize when using errors="ignore" and extracting nullable nested metadata.

@qdbp qdbp force-pushed the gh-44312-fix-typerror-on-missing-nested-meta-fields branch 2 times, most recently from d25e6ca to ba914c2 Compare November 5, 2021 15:51
@qdbp qdbp changed the title BUG: closes #44312: fixes unwanted TypeError with nullable nested metadata BUG: closes #44312: fixes unwanted TypeError with nullable nested metadata in json_normalize Nov 5, 2021
@qdbp qdbp force-pushed the gh-44312-fix-typerror-on-missing-nested-meta-fields branch 3 times, most recently from 022cbb6 to 5346a3a Compare November 5, 2021 17:42
@@ -389,6 +389,8 @@ def _pull_field(
try:
if isinstance(spec, list):
for field in spec:
if result is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this fixes the issue, the typing suggests result should be dict[str, Any] from js. Is the typing incorrect or the issue more upstream?

Copy link
Contributor Author

@qdbp qdbp Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the Any value type on the js dict: once we're one level down we're no longer guaranteed that result is a Dict. It could be (and in this case is) None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proper json typing requires recursive types, which seem to be, as of very recently, tentatively supported: python/mypy#731 (comment). This is a whole other kettle of fish, though, that will no doubt reveal a plethora of similar soundness issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Could you add an informative message in KeyError and test it if possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mroeschke mroeschke added the IO JSON read_json, to_json, json_normalize label Nov 6, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also pls add a whatsnew note in I/O seciton on 1.4 bug fixes

@qdbp qdbp force-pushed the gh-44312-fix-typerror-on-missing-nested-meta-fields branch from 5346a3a to 04eddf9 Compare November 8, 2021 17:48
@qdbp qdbp requested a review from jreback November 8, 2021 21:00
@qdbp qdbp force-pushed the gh-44312-fix-typerror-on-missing-nested-meta-fields branch from 04eddf9 to 1f1917d Compare November 9, 2021 01:17
@jreback jreback added this to the 1.4 milestone Nov 14, 2021
@jreback jreback merged commit 917d04f into pandas-dev:master Nov 14, 2021
@jreback
Copy link
Contributor

jreback commented Nov 14, 2021

thanks @qdbp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: json_normalize crashes when len(record_path)==1 and non-leaf node of nested meta path is None
3 participants