-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Enhanced json normalize #23861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhanced json normalize #23861
Conversation
max_level param defines at the level of nesting at which normalizing should stop. ignore_keys defines the keys to ignore without normalizing
Hello @bhavaniravi! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-04-30 08:23:34 UTC |
Codecov Report
@@ Coverage Diff @@
## master #23861 +/- ##
==========================================
- Coverage 91.97% 91.96% -0.01%
==========================================
Files 175 175
Lines 52368 52371 +3
==========================================
Hits 48164 48164
- Misses 4204 4207 +3
Continue to review full report at Codecov.
|
Good start (left a couple of comments)! Can you also add a mini-section to the |
pandas/io/json/normalize.py
Outdated
@@ -41,6 +44,11 @@ def nested_to_record(ds, prefix="", sep=".", level=0): | |||
|
|||
level: the number of levels in the jason string, optional, default: 0 | |||
|
|||
max_level: normalize to a maximum level of, optional, default: None | |||
ignore_keys: specific keys to normalize, optional, default: None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of this parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid specific keys from getting normalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between this and record_path
in json_normalize
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record_path
defines the "path to the data to be normalized". Where as with max_level
it assumes record_path
level as 0 and normalizes it until max_level. The key path in ignore_keys
will be left out
closing. if you want to continue, pls ping. needs to merge master and update to comments. |
…_records_path_bug_fix
Yes I want to continue this to merging. I was waiting for #22706 to get merged. |
@@ -277,6 +277,56 @@ def test_missing_field(self, author_missing_data): | |||
expected = DataFrame(ex_data) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_records_path_with_nested_data(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd Added a test case with existing record_path
and meta
keys with newly added max_level
and ignore_keys
param
you have included get-pip somehow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't really looked at the implementation, but has some issues to fix first
|
||
ignore_keys: list, optional, keys to ignore, default None | ||
|
||
.. versionadded:: 0.25.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not lined up
@@ -65,10 +75,9 @@ def nested_to_record(ds, prefix="", sep=".", level=0): | |||
if isinstance(ds, dict): | |||
ds = [ds] | |||
singleton = True | |||
|
|||
ignore_keys = ignore_keys if ignore_keys else [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this getting mutated?
2 Palm Beach 60000 Rick Scott Florida FL | ||
3 Summit 1234 John Kasich Ohio OH | ||
4 Cuyahoga 1337 John Kasich Ohio OH | ||
name population state shortname info.governor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there a period here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because info.governer
is nested.
@@ -460,3 +508,93 @@ def test_nonetype_multiple_levels(self): | |||
'location.country.state.town.info.y': -33.148521423339844, | |||
'location.country.state.town.info.z': 27.572303771972656} | |||
assert result == expected | |||
|
|||
def test_with_max_level_none(self): | |||
data = [{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need the issue number as a comment
…nced_json_normalize
Not sure why this test case |
Try merging master one more time - this issue was just fixed recently.
…Sent from my iPhone
On Apr 23, 2019, at 1:09 AM, Bhavani Ravi ***@***.***> wrote:
Not sure why this test case test_missing is failing. @WillAyd Can you help me figure this out. https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=10829
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
…nced_json_normalize
…nced_json_normalize
…nced_json_normalize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry didn't see latest updates on the review side. I still really think it would be better if we just went forward with max_level
here and did ignore_keys
as a follow up - otherwise we've kind of implicitly bundled this together which slows down reviews and makes code more error prone.
Do you have any objection to splitting these up into separate PRs? If not can you focus on max_level
in this one and add a whatsnew note?
""" | ||
A simplified json_normalize. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert the change to this line?
@WillAyd I understand. Should I close this and create 2 new PRs. I'm just not sure how to proceed from here. |
You can keep this PR and just focus on the max_level piece - can come back to the second enhancement after that in another PR |
@WillAyd sure, let's do that. I have a whole weekend ahead of me. So what's next? |
@bhavaniravi try stripping out anything that isn't relevant for |
@bhavaniravi any interest in continuing here? I think these changes are great just need to simplify a bit to get through. Let me know if you can address comments above |
@WillAyd Moved the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will have to take a look
@@ -25,9 +25,11 @@ def _convert_to_line_delimits(s): | |||
return convert_json_to_lines(s) | |||
|
|||
|
|||
def nested_to_record(ds, prefix="", sep=".", level=0): | |||
def nested_to_record(ds, prefix="", sep=".", level=0, | |||
max_level=None, ignore_keys=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u type these parameters
Closing this we will track it in 2 separate PRs |
closes #23843
closes #23861
git diff upstream/master -u -- "*.py" | flake8 --diff