Skip to content

Enhanced json normalize #23861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

bhavaniravi
Copy link
Contributor

@bhavaniravi bhavaniravi commented Nov 22, 2018

closes #23843
closes #23861

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

max_level param defines at the level of nesting at which normalizing should stop.
ignore_keys defines the keys to ignore without normalizing
@pep8speaks
Copy link

pep8speaks commented Nov 22, 2018

Hello @bhavaniravi! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-04-30 08:23:34 UTC

@codecov
Copy link

codecov bot commented Nov 22, 2018

Codecov Report

Merging #23861 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23861      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52371       +3     
==========================================
  Hits        48164    48164              
- Misses       4204     4207       +3
Flag Coverage Δ
#multiple 90.52% <100%> (ø) ⬆️
#single 40.69% <14.28%> (-0.16%) ⬇️
Impacted Files Coverage Δ
pandas/io/json/normalize.py 97.02% <100%> (+0.09%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.71% <0%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...217d4ae. Read the comment docs.

@gfyoung
Copy link
Member

gfyoung commented Nov 22, 2018

Good start (left a couple of comments)! Can you also add a mini-section to the whatsnew explaining what these parameters do and how you use them?

@gfyoung gfyoung added Enhancement IO JSON read_json, to_json, json_normalize labels Nov 22, 2018
@@ -41,6 +44,11 @@ def nested_to_record(ds, prefix="", sep=".", level=0):

level: the number of levels in the jason string, optional, default: 0

max_level: normalize to a maximum level of, optional, default: None
ignore_keys: specific keys to normalize, optional, default: None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid specific keys from getting normalized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between this and record_path in json_normalize then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

record_path defines the "path to the data to be normalized". Where as with max_level it assumes record_path level as 0 and normalizes it until max_level. The key path in ignore_keys will be left out

@jreback
Copy link
Contributor

jreback commented Dec 23, 2018

closing. if you want to continue, pls ping. needs to merge master and update to comments.

@jreback jreback closed this Dec 23, 2018
@bhavaniravi
Copy link
Contributor Author

Yes I want to continue this to merging. I was waiting for #22706 to get merged.

@WillAyd WillAyd reopened this Dec 30, 2018
@@ -277,6 +277,56 @@ def test_missing_field(self, author_missing_data):
expected = DataFrame(ex_data)
tm.assert_frame_equal(result, expected)

def test_records_path_with_nested_data(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd Added a test case with existing record_path and meta keys with newly added max_level and ignore_keys param

@jreback
Copy link
Contributor

jreback commented Apr 20, 2019

you have included get-pip somehow

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haven't really looked at the implementation, but has some issues to fix first


ignore_keys: list, optional, keys to ignore, default None

.. versionadded:: 0.25.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not lined up

@@ -65,10 +75,9 @@ def nested_to_record(ds, prefix="", sep=".", level=0):
if isinstance(ds, dict):
ds = [ds]
singleton = True

ignore_keys = ignore_keys if ignore_keys else []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this getting mutated?

2 Palm Beach 60000 Rick Scott Florida FL
3 Summit 1234 John Kasich Ohio OH
4 Cuyahoga 1337 John Kasich Ohio OH
name population state shortname info.governor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a period here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because info.governer is nested.

@@ -460,3 +508,93 @@ def test_nonetype_multiple_levels(self):
'location.country.state.town.info.y': -33.148521423339844,
'location.country.state.town.info.z': 27.572303771972656}
assert result == expected

def test_with_max_level_none(self):
data = [{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need the issue number as a comment

@bhavaniravi
Copy link
Contributor Author

Not sure why this test case test_missing is failing. @WillAyd Can you help me figure this out. https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=10829

@WillAyd
Copy link
Member

WillAyd commented Apr 23, 2019 via email

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry didn't see latest updates on the review side. I still really think it would be better if we just went forward with max_level here and did ignore_keys as a follow up - otherwise we've kind of implicitly bundled this together which slows down reviews and makes code more error prone.

Do you have any objection to splitting these up into separate PRs? If not can you focus on max_level in this one and add a whatsnew note?

"""
A simplified json_normalize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert the change to this line?

@bhavaniravi
Copy link
Contributor Author

Sorry didn't see latest updates on the review side. I still really think it would be better if we just went forward with max_level here and did ignore_keys as a follow up - otherwise we've kind of implicitly bundled this together which slows down reviews and makes code more error prone.

Do you have any objection to splitting these up into separate PRs? If not can you focus on max_level in this one and add a whatsnew note?

@WillAyd I understand. Should I close this and create 2 new PRs. I'm just not sure how to proceed from here.

@WillAyd
Copy link
Member

WillAyd commented May 12, 2019

You can keep this PR and just focus on the max_level piece - can come back to the second enhancement after that in another PR

@bhavaniravi
Copy link
Contributor Author

@WillAyd sure, let's do that. I have a whole weekend ahead of me. So what's next?

@WillAyd
Copy link
Member

WillAyd commented May 19, 2019

@bhavaniravi try stripping out anything that isn't relevant for max_level to work

@WillAyd
Copy link
Member

WillAyd commented Jun 12, 2019

@bhavaniravi any interest in continuing here? I think these changes are great just need to simplify a bit to get through. Let me know if you can address comments above

@bhavaniravi
Copy link
Contributor Author

@WillAyd Moved the max_level to a different PR. Will give one for ignore_keys after it gets merged.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will have to take a look

@@ -25,9 +25,11 @@ def _convert_to_line_delimits(s):
return convert_json_to_lines(s)


def nested_to_record(ds, prefix="", sep=".", level=0):
def nested_to_record(ds, prefix="", sep=".", level=0,
max_level=None, ignore_keys=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u type these parameters

@bhavaniravi
Copy link
Contributor Author

Closing this we will track it in 2 separate PRs

@bhavaniravi bhavaniravi deleted the enhanced_json_normalize branch July 6, 2019 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Configurable json_normalize with respect to number of levels and Keys to be flattened
6 participants