Enhanced json normalize #23861

bhavaniravi · 2018-11-22T16:07:32Z

tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

max_level param defines at the level of nesting at which normalizing should stop. ignore_keys defines the keys to ignore without normalizing

pep8speaks · 2018-11-22T16:07:34Z

Hello @bhavaniravi! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-04-30 08:23:34 UTC

codecov · 2018-11-22T16:57:28Z

Codecov Report

Merging #23861 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #23861      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52371       +3     
==========================================
  Hits        48164    48164              
- Misses       4204     4207       +3

Flag	Coverage Δ
#multiple	`90.52% <100%> (ø)`	⬆️
#single	`40.69% <14.28%> (-0.16%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/json/normalize.py	`97.02% <100%> (+0.09%)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`96.9% <0%> (-0.12%)`	⬇️
pandas/util/testing.py	`90.71% <0%> (+0.1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...217d4ae. Read the comment docs.

pandas/io/json/normalize.py

pandas/tests/io/json/test_normalize.py

gfyoung · 2018-11-22T22:30:57Z

Good start (left a couple of comments)! Can you also add a mini-section to the whatsnew explaining what these parameters do and how you use them?

WillAyd · 2018-11-23T02:59:45Z

pandas/io/json/normalize.py

@@ -41,6 +44,11 @@ def nested_to_record(ds, prefix="", sep=".", level=0):

    level: the number of levels in the jason string, optional, default: 0

+    max_level: normalize to a maximum level of, optional, default: None
+    ignore_keys: specific keys to normalize, optional, default: None


What is the point of this parameter?

To avoid specific keys from getting normalized.

What is the difference between this and record_path in json_normalize then?

record_path defines the "path to the data to be normalized". Where as with max_level it assumes record_path level as 0 and normalizes it until max_level. The key path in ignore_keys will be left out

pandas/io/json/normalize.py

jreback · 2018-12-23T23:18:07Z

closing. if you want to continue, pls ping. needs to merge master and update to comments.

…_records_path_bug_fix

bhavaniravi · 2018-12-30T15:58:10Z

Yes I want to continue this to merging. I was waiting for #22706 to get merged.

bhavaniravi · 2019-01-03T08:39:42Z

pandas/tests/io/json/test_normalize.py

@@ -277,6 +277,56 @@ def test_missing_field(self, author_missing_data):
        expected = DataFrame(ex_data)
        tm.assert_frame_equal(result, expected)

+    def test_records_path_with_nested_data(self):


@WillAyd Added a test case with existing record_path and meta keys with newly added max_level and ignore_keys param

pandas/io/json/normalize.py

pandas/tests/io/json/test_normalize.py

jreback · 2019-04-20T18:13:54Z

you have included get-pip somehow

jreback

haven't really looked at the implementation, but has some issues to fix first

pandas/io/json/normalize.py

jreback · 2019-04-21T16:22:35Z

pandas/io/json/normalize.py

+
+    ignore_keys: list, optional, keys to ignore, default None
+
+         .. versionadded:: 0.25.0


this is not lined up

jreback · 2019-04-21T16:23:11Z

pandas/io/json/normalize.py

@@ -65,10 +75,9 @@ def nested_to_record(ds, prefix="", sep=".", level=0):
    if isinstance(ds, dict):
        ds = [ds]
        singleton = True
-
+    ignore_keys = ignore_keys if ignore_keys else []


is this getting mutated?

pandas/io/json/normalize.py

jreback · 2019-04-21T16:25:29Z

pandas/io/json/normalize.py

-    2  Palm Beach       60000    Rick Scott  Florida        FL
-    3      Summit        1234   John Kasich     Ohio        OH
-    4    Cuyahoga        1337   John Kasich     Ohio        OH
+             name  population    state shortname info.governor


why is there a period here?

Because info.governer is nested.

pandas/tests/io/json/test_normalize.py

jreback · 2019-04-21T16:26:42Z

pandas/tests/io/json/test_normalize.py

@@ -460,3 +508,93 @@ def test_nonetype_multiple_levels(self):
            'location.country.state.town.info.y': -33.148521423339844,
            'location.country.state.town.info.z': 27.572303771972656}
        assert result == expected
+
+    def test_with_max_level_none(self):
+        data = [{


need the issue number as a comment

…nced_json_normalize

bhavaniravi · 2019-04-23T08:09:36Z

Not sure why this test case test_missing is failing. @WillAyd Can you help me figure this out. https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=10829

WillAyd · 2019-04-23T12:42:03Z

Try merging master one more time - this issue was just fixed recently.

…

Sent from my iPhone

On Apr 23, 2019, at 1:09 AM, Bhavani Ravi ***@***.***> wrote: Not sure why this test case test_missing is failing. @WillAyd Can you help me figure this out. https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=10829 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

…nced_json_normalize

WillAyd

Sorry didn't see latest updates on the review side. I still really think it would be better if we just went forward with max_level here and did ignore_keys as a follow up - otherwise we've kind of implicitly bundled this together which slows down reviews and makes code more error prone.

Do you have any objection to splitting these up into separate PRs? If not can you focus on max_level in this one and add a whatsnew note?

WillAyd · 2019-05-06T18:36:12Z

pandas/io/json/normalize.py

    """
-    A simplified json_normalize.
+


Can you revert the change to this line?

bhavaniravi · 2019-05-12T11:33:26Z

Sorry didn't see latest updates on the review side. I still really think it would be better if we just went forward with max_level here and did ignore_keys as a follow up - otherwise we've kind of implicitly bundled this together which slows down reviews and makes code more error prone.

Do you have any objection to splitting these up into separate PRs? If not can you focus on max_level in this one and add a whatsnew note?

@WillAyd I understand. Should I close this and create 2 new PRs. I'm just not sure how to proceed from here.

WillAyd · 2019-05-12T15:02:01Z

You can keep this PR and just focus on the max_level piece - can come back to the second enhancement after that in another PR

bhavaniravi · 2019-05-17T01:54:28Z

@WillAyd sure, let's do that. I have a whole weekend ahead of me. So what's next?

WillAyd · 2019-05-19T02:35:52Z

@bhavaniravi try stripping out anything that isn't relevant for max_level to work

WillAyd · 2019-06-12T23:05:04Z

@bhavaniravi any interest in continuing here? I think these changes are great just need to simplify a bit to get through. Let me know if you can address comments above

bhavaniravi · 2019-06-16T15:15:08Z

@WillAyd Moved the max_level to a different PR. Will give one for ignore_keys after it gets merged.

jreback

will have to take a look

jreback · 2019-06-16T15:17:55Z

pandas/io/json/normalize.py

@@ -25,9 +25,11 @@ def _convert_to_line_delimits(s):
    return convert_json_to_lines(s)


-def nested_to_record(ds, prefix="", sep=".", level=0):
+def nested_to_record(ds, prefix="", sep=".", level=0,
+                     max_level=None, ignore_keys=None):


can u type these parameters

bhavaniravi · 2019-06-17T12:24:22Z

Closing this we will track it in 2 separate PRs

bhavaniravi added 2 commits November 22, 2018 21:02

ENH add max_level and ignore_keys configuration to nested_to_records

cb53be7

max_level param defines at the level of nesting at which normalizing should stop. ignore_keys defines the keys to ignore without normalizing

ENH extend max_level and ignore keys to

0972746

fix pep8 issues

5a5c708

add whatsnew to doc string

be7ec0e

gfyoung reviewed Nov 22, 2018

View reviewed changes

pandas/io/json/normalize.py Outdated Show resolved Hide resolved

gfyoung reviewed Nov 22, 2018

View reviewed changes

pandas/tests/io/json/test_normalize.py Outdated Show resolved Hide resolved

gfyoung added Enhancement IO JSON read_json, to_json, json_normalize labels Nov 22, 2018

WillAyd requested changes Nov 23, 2018

View reviewed changes

bhavaniravi and others added 5 commits November 23, 2018 16:41

add testcase with large max_level

a79e126

add explation for flatten if condition

cd12a23

update doc_string and built documentation

d3b3503

fix json normalize records path issue

4ec60bc

Merge branch 'master' into enhanced_json_normalize

e001264

jreback closed this Dec 23, 2018

Merge branch 'master' of git://github.com/pandas-dev/pandas into json…

5c88339

…_records_path_bug_fix

WillAyd reopened this Dec 30, 2018

bhavaniravi added 5 commits January 3, 2019 11:31

fix merge conflict

55f7b1c

fix testcase error

1af2bfc

add nested flattening to json_normalize

882a2ca

fixed pep8 issues

caba6db

fix merge conflict

4e22c69

bhavaniravi commented Jan 3, 2019

View reviewed changes

WillAyd requested changes Jan 4, 2019

View reviewed changes

pandas/io/json/normalize.py Outdated Show resolved Hide resolved

pandas/io/json/normalize.py Show resolved Hide resolved

pandas/io/json/normalize.py Show resolved Hide resolved

pandas/tests/io/json/test_normalize.py Outdated Show resolved Hide resolved

fix issues with doc string

c2eff85

bhavaniravi added 3 commits April 21, 2019 01:08

remove get_pip file

cb82bca

rename test func test_max_level_with_record_prefix

2a7b966

fix pep8 over-intended line

4635591

jreback requested changes Apr 21, 2019

View reviewed changes

bhavaniravi added 8 commits April 21, 2019 22:40

fix docstring formatting issues

22fd84e

convert to a fixture

2e407e3

convert to inline data

cf27cae

fix docstring formatting issues

124fbd9

fix docstring formatting issues

7b65999

add github issue id to test case

03d3d23

fix pep8 flake issues

8e61a04

Merge branch 'master' of git://github.com/pandas-dev/pandas into enha…

b808d5a

…nced_json_normalize

bhavaniravi added 3 commits April 23, 2019 19:15

Merge branch 'master' of git://github.com/pandas-dev/pandas into enha…

0eaea30

…nced_json_normalize

Merge branch 'master' of git://github.com/pandas-dev/pandas into enha…

837ba18

…nced_json_normalize

Merge branch 'master' of git://github.com/pandas-dev/pandas into enha…

217d4ae

…nced_json_normalize

WillAyd requested changes May 6, 2019

View reviewed changes

pandas/io/json/normalize.py

"""

A simplified json_normalize.

Copy link

Member

WillAyd May 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert the change to this line?

bhavaniravi mentioned this pull request Jun 16, 2019

Enhancement Add max_level param to json_normalize #26876

Merged

5 tasks

jreback requested changes Jun 16, 2019

View reviewed changes

bhavaniravi closed this Jun 17, 2019

bhavaniravi deleted the enhanced_json_normalize branch July 6, 2019 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhanced json normalize #23861

Enhanced json normalize #23861

bhavaniravi commented Nov 22, 2018 •

edited by jreback

Loading

pep8speaks commented Nov 22, 2018 •

edited

Loading

codecov bot commented Nov 22, 2018 •

edited

Loading

gfyoung commented Nov 22, 2018 •

edited

Loading

WillAyd Nov 23, 2018

bhavaniravi Nov 23, 2018

WillAyd Nov 23, 2018

bhavaniravi Dec 30, 2018

jreback commented Dec 23, 2018

bhavaniravi commented Dec 30, 2018

bhavaniravi Jan 3, 2019

jreback commented Apr 20, 2019

jreback left a comment

jreback Apr 21, 2019

jreback Apr 21, 2019

jreback Apr 21, 2019

bhavaniravi Apr 21, 2019

jreback Apr 21, 2019

bhavaniravi commented Apr 23, 2019

WillAyd commented Apr 23, 2019 via email

WillAyd left a comment

WillAyd May 6, 2019

bhavaniravi commented May 12, 2019

WillAyd commented May 12, 2019

bhavaniravi commented May 17, 2019

WillAyd commented May 19, 2019

WillAyd commented Jun 12, 2019

bhavaniravi commented Jun 16, 2019

jreback left a comment

jreback Jun 16, 2019

bhavaniravi commented Jun 17, 2019


		ignore_keys: list, optional, keys to ignore, default None

		.. versionadded:: 0.25.0

Enhanced json normalize #23861

Enhanced json normalize #23861

Conversation

bhavaniravi commented Nov 22, 2018 • edited by jreback Loading

pep8speaks commented Nov 22, 2018 • edited Loading

Comment last updated at 2019-04-30 08:23:34 UTC

codecov bot commented Nov 22, 2018 • edited Loading

Codecov Report

gfyoung commented Nov 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 23, 2018

bhavaniravi commented Dec 30, 2018

Choose a reason for hiding this comment

jreback commented Apr 20, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhavaniravi commented Apr 23, 2019

WillAyd commented Apr 23, 2019 via email

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhavaniravi commented May 12, 2019

WillAyd commented May 12, 2019

bhavaniravi commented May 17, 2019

WillAyd commented May 19, 2019

WillAyd commented Jun 12, 2019

bhavaniravi commented Jun 16, 2019

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhavaniravi commented Jun 17, 2019

bhavaniravi commented Nov 22, 2018 •

edited by jreback

Loading

pep8speaks commented Nov 22, 2018 •

edited

Loading

codecov bot commented Nov 22, 2018 •

edited

Loading

gfyoung commented Nov 22, 2018 •

edited

Loading