Skip to content

BUG: fix nested meta path bug (GH 27220) #27667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

another-green
Copy link
Contributor

@another-green another-green commented Jul 31, 2019

This PR attempts to solve the missing value problems when using nested meta with nested record path.

The assumption is that we want to get the value of nested meta field even it is not on the record path. For example, in the example of GH 27220, ['info', 'governor'] and ['counties', 'name'] are not on the same path. The current behavior is to raise key error. I change the behavior to return the value of the key if we can find it in the json.

Let me know how we want to move forward with this. Thanks~

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a bug fix note in io section, 0.25.1 is good

@jreback jreback added Bug IO JSON read_json, to_json, json_normalize labels Aug 1, 2019
@simonjayhawkins
Copy link
Member

@yanglinlee since we are dropping Python 3.5 very shortly, it may be best to skip the test on Python 3.5 by adding @pytest.mark.skipif(not PY36, reason="...")

@another-green
Copy link
Contributor Author

Do I need to make more changes to this PR? Thanks for the reviews!

@TomAugspurger
Copy link
Contributor

Fixed the merge conflict. Ping on green.

@another-green
Copy link
Contributor Author

@TomAugspurger The checks have passed. Now it is green.

@@ -287,6 +287,30 @@ def test_shallow_nested(self):
expected = DataFrame(ex_data, columns=result.columns)
tm.assert_frame_equal(result, expected)

@pytest.mark.skipif(not PY36, reason="drop support for 3.5 soon")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed if you follow comment below

meta=["state", "shortname", ["info", "governor"]],
errors="ignore",
)
ex_data = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a dict can you construct this with literal values? If you use a list of lists can circumvent 3.5 ordering issues with a dict

If you can simplify expectation would help a lot as well so as not to write out the same values 21 times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. It is indeed better. I simply copied other test cases. I will change accordingly.

- Avoid calling ``S3File.s3`` when reading parquet, as this was removed in s3fs version 0.3.0 (:issue:`27756`)
- Better error message when a negative header is passed in :func:`pandas.read_csv` (:issue:`27779`)
-
- Fix bug in :meth:`io.json.json_normalize` when nested meta paths with a nested record path. (:issue:`27220`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to 0.25.2 at this point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, I think we're being conservative with 0.25.2 backports. Just 3.8 compat and fixes for regressions in 0.25.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - @yanglinlee let's go 1.0.0 instead then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. This is not very urgent issue. Thanks for the reviews~

@WillAyd WillAyd added this to the 0.25.2 milestone Aug 23, 2019
@simonjayhawkins simonjayhawkins modified the milestones: 0.25.2, 1.0 Aug 23, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you merge master and update to comments

if level + 1 > len(val):
meta_val = seen_meta[key]
# Extract the value of the key from seen_meta when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a blank line fore these comments (and below and above), basically easier to read if they are paragraph like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. Will update this tomorrow.

@jreback jreback removed this from the 1.0 milestone Sep 8, 2019
@WillAyd
Copy link
Member

WillAyd commented Oct 11, 2019

@yanglinlee looks like a merge conflict and CI failure - mind taking a look?

@jbrockmendel
Copy link
Member

@yanglinlee i think the CI failure for the docbuild is unrelated. can you re-push and we'll see if its fixed

@jreback
Copy link
Contributor

jreback commented Dec 2, 2019

@yanglinlee we would take a patch like this if you want to update. closing as stale for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

json_normalize does not handle nested meta paths when also using a nested record_path
6 participants