Skip to content

ENH: json_normalize should allow a different separator than . #14883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jowens opened this issue Dec 14, 2016 · 5 comments
Closed

ENH: json_normalize should allow a different separator than . #14883

jowens opened this issue Dec 14, 2016 · 5 comments
Labels
Enhancement IO JSON read_json, to_json, json_normalize
Milestone

Comments

@jowens
Copy link

jowens commented Dec 14, 2016

>>> import pandas
>>> col_in = ['c1', 'c2.x']
>>> df = pandas.DataFrame([['A', 0], ['B', 1]], columns=col_in)
>>> df.c1
0    A
1    B
Name: c1, dtype: object
>>> df.c2.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 2744, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'c2'

Problem description

The above snippet shows that it's not ideal to have . as a character in a column name. (I'm running into this when using Vega for data visualization, vega/vega-lite#1775.) When json_normalize flattens a nested input JSON, it separates the nesting levels with a .. I believe this happens on this line:

meta_keys = ['.'.join(val) for val in meta]

I'd like to see an additional argument to json_normalize, separator, with default ., that specified the character (string) that separated nesting levels. In the line of code above, '.'.join(val) would be replaced by separator.join(val) (if I'm reading what that line does correctly). I could use, say, _ to use underscore instead of period.

n00b at pandas, please correct me if I'm doing anything wrong.

Expected Output

Output of pd.show_versions()

>>> pandas.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Darwin
OS-release: 16.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.US-ASCII
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 30.3.0
Cython: 0.25.2
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@rcarneva
Copy link
Contributor

The dot notation for accessing columns is just a convenience. You can still get the column normally using df['c2.x'].

@jowens
Copy link
Author

jowens commented Dec 15, 2016

I understand. I'm merely offering the observation that columns with . in the names are perhaps not a perfect fit for everything in Pandas and having the option to use a different separator might be also useful for someone who isn't me.

@rcarneva
Copy link
Contributor

Sure thing. Just wanted to point that out in case it was keeping you from doing something you needed to do now since you said that you were new to pandas.

@jowens
Copy link
Author

jowens commented Dec 15, 2016

Yeah, it's the vega-lite bug I filed (vega/vega-lite#1775) that's my proximate difficulty here.

@jreback jreback added the IO JSON read_json, to_json, json_normalize label Dec 15, 2016
@jreback
Copy link
Contributor

jreback commented Dec 15, 2016

this would be quite easy to add; PR's welcome.

@jreback jreback added this to the Next Major Release milestone Dec 15, 2016
@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 28, 2017
mattip pushed a commit to mattip/pandas that referenced this issue Apr 3, 2017
closes pandas-dev#14883

Author: Jeff Reback <[email protected]>
Author: John Owens <[email protected]>

Closes pandas-dev#14950 from jowens/json_normalize-separator and squashes the following commits:

0327dd1 [Jeff Reback] compare sorted columns
bc5aae8 [Jeff Reback] CLN: fixup json_normalize with sep
8edc40e [John Owens] ENH: json_normalize now takes a user-specified separator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO JSON read_json, to_json, json_normalize
Projects
None yet
3 participants