Skip to content

ENH: GH14883: json_normalize now takes a user-specified separator #14950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

ENH: GH14883: json_normalize now takes a user-specified separator #14950

wants to merge 3 commits into from

Conversation

jowens
Copy link

@jowens jowens commented Dec 22, 2016

However, this doesn't work, even after making the fixes suggested in #14891. I thought replacing '.' in meta_keys = ['.'.join(val) for val in meta] with sep would do the trick. It doesn't, so I'm a little puzzled. Happy to take a suggestion.

@jreback
Copy link
Contributor

jreback commented Dec 22, 2016

you don't need to open/close new PR's. simply push new commits to this one.

@jreback jreback added Enhancement IO JSON read_json, to_json, json_normalize labels Dec 22, 2016
@@ -109,6 +109,7 @@ Other enhancements
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)

- ``.select_dtypes()`` now allows the string 'datetimetz' to generically select datetimes with tz (:issue:`14910`)
+- ``pandas.io.json.json_normalize()`` gained ``sep`` option that accepts ``str``, default is ".", which is backward compatible. (:issue:`14883`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a sep option

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to say accepts 'str', explain the purpose instead.

errors='raise'):

errors='raise',
sep='.'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put this before errors

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Comment on the previous pull request indicated that I should put this last in order to not break any current code that uses positional arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but I don't think its a big deal and a bit cleaner this way

@@ -63,6 +63,21 @@ def test_simple_normalize(self):

tm.assert_frame_equal(result, expected)

def test_simple_normalize_with_default_separator(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number as a comment

you can make this all 1 test (with 3 sub-tests).

@jowens
Copy link
Author

jowens commented Dec 22, 2016

@jreback Thank you for all the kind comments! Um, can you take a minute or three to look at the actual code in json_normalize and clue me into where I'm going wrong? ('Cause it doesn't work.)

@jreback
Copy link
Contributor

jreback commented Dec 22, 2016

@jowens no idea, you'll have to step thru. my guess is that meta_keys is used for looking things up AND naming things.

@jreback
Copy link
Contributor

jreback commented Jan 21, 2017

can you update

@jowens
Copy link
Author

jowens commented Jan 22, 2017

I have no update, sorry to say. Someone who understands this code better is gonna be necessary to make this fix, I'm afraid.

@jreback
Copy link
Contributor

jreback commented Jan 22, 2017

@jowens I pushed a commit here. 3 issues.

  1. the nested record needed to also be updated with the sep (was .)
  2. you can create a columns via a dict, but its pretty useless as the ordering is not guaranteed, so you oftentimes mistmatch vs the values, passing a list is idiomatic
  3. since the json ordering is essentially a dict, we are not guaranteed orderings vs our expected (in column space), but we can reindex to compare properly.

I think this might benefit from a deeper nested example as well.

@codecov-io
Copy link

codecov-io commented Jan 22, 2017

Codecov Report

Merging #14950 into master will decrease coverage by 0.02%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #14950      +/-   ##
==========================================
- Coverage   90.99%   90.97%   -0.03%     
==========================================
  Files         143      143              
  Lines       49420    49422       +2     
==========================================
- Hits        44972    44963       -9     
- Misses       4448     4459      +11
Impacted Files Coverage Δ
pandas/io/json/normalize.py 96.93% <85.71%> (-0.98%) ⬇️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/common.py 90.96% <0%> (-0.34%) ⬇️
pandas/core/frame.py 97.56% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec84ae3...0327dd1. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Jan 31, 2017

@jowens can you update? (for the questions I had above)

@jowens
Copy link
Author

jowens commented Jan 31, 2017

Is the update you want "write a better test for 2 and 3, and add a deeper example"?

@jreback
Copy link
Contributor

jreback commented Jan 31, 2017

yep

@jreback
Copy link
Contributor

jreback commented Feb 16, 2017

can you rebase / add those addtl tests?

@jowens
Copy link
Author

jowens commented Feb 16, 2017

I'd like to, I just haven't had the opportunity to write them yet. Sorry.

@jreback
Copy link
Contributor

jreback commented Feb 16, 2017

@jowens np. just moving PR's along. pls ping when you are ready for review.

@jreback jreback added this to the 0.20.0 milestone Mar 28, 2017
@jreback jreback closed this in 34c6bd0 Mar 28, 2017
mattip pushed a commit to mattip/pandas that referenced this pull request Apr 3, 2017
closes pandas-dev#14883

Author: Jeff Reback <[email protected]>
Author: John Owens <[email protected]>

Closes pandas-dev#14950 from jowens/json_normalize-separator and squashes the following commits:

0327dd1 [Jeff Reback] compare sorted columns
bc5aae8 [Jeff Reback] CLN: fixup json_normalize with sep
8edc40e [John Owens] ENH: json_normalize now takes a user-specified separator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: json_normalize should allow a different separator than .
3 participants