ENH: json_normalize should allow a different separator than . #14883

jowens · 2016-12-14T21:24:36Z

>>> import pandas
>>> col_in = ['c1', 'c2.x']
>>> df = pandas.DataFrame([['A', 0], ['B', 1]], columns=col_in)
>>> df.c1
0    A
1    B
Name: c1, dtype: object
>>> df.c2.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 2744, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'c2'

Problem description

The above snippet shows that it's not ideal to have . as a character in a column name. (I'm running into this when using Vega for data visualization, vega/vega-lite#1775.) When json_normalize flattens a nested input JSON, it separates the nesting levels with a .. I believe this happens on this line:

pandas/pandas/io/json.py

Line 831 in 7d8bc0d

meta_keys = ['.'.join(val) for val in meta]

I'd like to see an additional argument to json_normalize, separator, with default ., that specified the character (string) that separated nesting levels. In the line of code above, '.'.join(val) would be replaced by separator.join(val) (if I'm reading what that line does correctly). I could use, say, _ to use underscore instead of period.

n00b at pandas, please correct me if I'm doing anything wrong.

Expected Output

Output of `pd.show_versions()`

>>> pandas.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Darwin
OS-release: 16.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.US-ASCII
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 30.3.0
Cython: 0.25.2
numpy: 1.11.2
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: 4.5.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

rcarneva · 2016-12-15T00:46:57Z

The dot notation for accessing columns is just a convenience. You can still get the column normally using df['c2.x'].

jowens · 2016-12-15T02:49:18Z

I understand. I'm merely offering the observation that columns with . in the names are perhaps not a perfect fit for everything in Pandas and having the option to use a different separator might be also useful for someone who isn't me.

rcarneva · 2016-12-15T03:32:27Z

Sure thing. Just wanted to point that out in case it was keeping you from doing something you needed to do now since you said that you were new to pandas.

jowens · 2016-12-15T03:33:47Z

Yeah, it's the vega-lite bug I filed (vega/vega-lite#1775) that's my proximate difficulty here.

jreback · 2016-12-15T23:44:47Z

this would be quite easy to add; PR's welcome.

closes pandas-dev#14883 Author: Jeff Reback <[email protected]> Author: John Owens <[email protected]> Closes pandas-dev#14950 from jowens/json_normalize-separator and squashes the following commits: 0327dd1 [Jeff Reback] compare sorted columns bc5aae8 [Jeff Reback] CLN: fixup json_normalize with sep 8edc40e [John Owens] ENH: json_normalize now takes a user-specified separator

jreback added the IO JSON read_json, to_json, json_normalize label Dec 15, 2016

jreback added Enhancement Difficulty Novice labels Dec 15, 2016

jreback added this to the Next Major Release milestone Dec 15, 2016

This was referenced Dec 16, 2016

Add new optional "separator" argument to json_normalize #14891

Closed

added 'separator' argument to json_normalize #14949

Closed

ENH: GH14883: json_normalize now takes a user-specified separator #14950

Closed

jreback modified the milestones: 0.20.0, Next Major Release Mar 28, 2017

jreback closed this as completed in 34c6bd0 Mar 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: json_normalize should allow a different separator than . #14883

ENH: json_normalize should allow a different separator than . #14883

jowens commented Dec 14, 2016

INSTALLED VERSIONS

rcarneva commented Dec 15, 2016

jowens commented Dec 15, 2016

rcarneva commented Dec 15, 2016

jowens commented Dec 15, 2016

jreback commented Dec 15, 2016

ENH: json_normalize should allow a different separator than . #14883

ENH: json_normalize should allow a different separator than . #14883

Comments

jowens commented Dec 14, 2016

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

rcarneva commented Dec 15, 2016

jowens commented Dec 15, 2016

rcarneva commented Dec 15, 2016

jowens commented Dec 15, 2016

jreback commented Dec 15, 2016

Output of `pd.show_versions()`