Skip to content

Configurable json_normalize with respect to number of levels and Keys to be flattened #23843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bhavaniravi opened this issue Nov 21, 2018 · 13 comments · Fixed by #26876
Closed
Labels
Enhancement IO JSON read_json, to_json, json_normalize
Milestone

Comments

@bhavaniravi
Copy link
Contributor

bhavaniravi commented Nov 21, 2018

Code Sample, a copy-pastable example if possible

data = [{"CreatedBy":{"Name": "User001"}, 
        "Lookup":{"TextField":"Some text", "UserField":{"Id":"ID001", "Name": "Name001"}},
        "Image":{"a":"b"}}]
json_normalize(data).to_json(orient="records")

Current Output

[{"CreatedBy.Name":"User001","Image.a":"b",
"Lookup.TextField":"Some text",
"Lookup.UserField.Id":"ID001","Lookup.UserField.Name":"Name001"}]

Problem description

I want to flatten only specific keys and up to a specific a specific level.
For eg., I want to flatten until level 1 and skip key image.

Expected Output

[{"CreatedBy.Name":"User001",
"Image:{a":"b}",
"Lookup.TextField":"Some text",
"Lookup.UserField":{Id":"ID001",
"Name":"Name001"}]

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 39.1.0
Cython: None
numpy: 1.15.1
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added IO JSON read_json, to_json, json_normalize Enhancement Usage Question and removed Enhancement Usage Question labels Nov 21, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 21, 2018

I thought for a moment that the meta parameter to json_normalize might help, but that was not to be. Any kind of enhancement would need to be done to json_normalize. It's "too late" once you get to the .to_json-DataFrame stage.

Interesting proposal. My only concern is the extent of use cases for partial flattening.

cc @WillAyd @jreback

@bhavaniravi
Copy link
Contributor Author

I have rewritten the nested_to_records method for my use.

Would love to contribute it back and extend it to json_normalize as well.

My use case is for exporting data for report generation.

@gfyoung
Copy link
Member

gfyoung commented Nov 21, 2018

That's great! Before we proceed, can you run tests on your machine to confirm that things don't break?

@bhavaniravi
Copy link
Contributor Author

Definitely. Will do that.

@WillAyd
Copy link
Member

WillAyd commented Nov 22, 2018

Yea I think it would be possible to add a level parameter here. There's a recursive function nested within which uses that as well, so may be able to wire those all together to control how deep the un-nesting of records goes.

Investigation and PRs certainly welcome

@bhavaniravi
Copy link
Contributor Author

I have a query. If the values of the keys in record path are nested dictionaries itself, is it supposed . to get flattened? I am going through the code and got stuck here

@bhavaniravi
Copy link
Contributor Author

bhavaniravi commented Nov 23, 2018

@WillAyd I am a bit confused with the behavior ofjson_normalize method with records_path. While just passing a list of records flattens the dictionary, it doesn't flattens the same on sending records_path param. Is it the excepted behavior?

data = [{'CreatedBy': {'Name': 'User001'},
             'Lookup': [{'TextField': 'Some text',
                                'UserField': {'Id': 'ID001', 'Name': 'Name001'}},
                              {'TextField': 'Some text',
                               'UserField': {'Id': 'ID001', 'Name': 'Name001'}}
                        ],
            'Image': {'a': 'b'}
        }]

json_normalize(data, record_path=["Lookup"])

Current Output

   TextField                           UserField
0  Some text  {'Id': 'ID001', 'Name': 'Name001'}
1  Some text  {'Id': 'ID001', 'Name': 'Name001'}

Expected Output

   TextField              UserField.Id       Name
0  Some text               ID001             Name001
1  Some text               ID001             Name001

@WillAyd
Copy link
Member

WillAyd commented Nov 23, 2018

@bhavaniravi that seems like a bug to me. I'd suggest opening that as a separate issue and working that as a pre-cursor to this. Would serve the purpose of not only fixing the bug but helping grow familiarity with the various parameters so we can see how this should fit into the context of them

@WillAyd
Copy link
Member

WillAyd commented Nov 24, 2018

Actually there's already an issue open about this #22706

@bhavaniravi
Copy link
Contributor Author

@WillAyd I think someone already gave a PR for the same. Should I be working on it too?

@WillAyd
Copy link
Member

WillAyd commented Nov 24, 2018

I just pinged the PR author on that one to see if they can make some updates. Let's give it a few days to see if they are interested in doing that first

@jreback
Copy link
Contributor

jreback commented Jun 28, 2019

@bhavaniravi your pr will close this the max_levels, but I don't think keys; pls open a new issue if that is needed / makes sense.

@jreback jreback modified the milestones: 0.25.0, Contributions Welcome Jun 28, 2019
@bhavaniravi
Copy link
Contributor Author

okay, will create another one.

@TomAugspurger TomAugspurger modified the milestones: Contributions Welcome, 0.25.0 Jul 3, 2019
weiji14 added a commit to weiji14/deepbedmap that referenced this issue Jul 19, 2019
Manually updating various PyData/data science dependencies in Pipfile, references to release notes and diff commits as follows:

- [dask](https://github.com/dask/dask) from 1.2.2 to 2.1.0
  - [Release notes](https://docs.dask.org/en/latest/changelog.html)
  - [Commits](dask/dask@v1.2.2...v2.1.0)
- [geopandas](https://github.com/geopandas/geopandas) from 0.5.0 to 0.5.1:
  - [Release notes](https://github.com/geopandas/geopandas/releases/tag/v0.5.1)
  - [Commits](geopandas/geopandas@v0.5.0...v0.5.1)
- [matplotlib](https://github.com/matplotlib/matplotlib) from 3.1.0 to 3.1.1:
  - [Release notes](https://github.com/matplotlib/matplotlib/releases/tag/v3.1.1)
  - [Commits](matplotlib/matplotlib@v3.1.0...v3.1.1)
- [numpy](https://github.com/numpy/numpy) from 1.16.4 to 1.17.0rc2:
  - [Release notes](https://github.com/numpy/numpy/releases/tag/v1.17.0rc2)
  - [Commits](numpy/numpy@v1.16.4...v1.17.0rc2)
- [pandas](https://github.com/pandas-dev/pandas) from 0.24.2 to 0.25.0:
  - [Release notes](https://github.com/pandas-dev/pandas/releases/tag/v0.25.0)
  - [Changelog](https://github.com/pandas-dev/pandas/blob/master/RELEASE.md)
  - [Commits](pandas-dev/pandas@v0.24.2...v0.25.0)
- [rasterio](https://github.com/mapbox/rasterio) from 1.0.23 to 1.0.24.
  - [Release notes](https://github.com/mapbox/rasterio/releases/tag/1.0.24)
  - [Changelog](https://github.com/mapbox/rasterio/blob/master/CHANGES.txt)
  - [Commits](rasterio/rasterio@1.0.23...1.0.24)
- [tqdm](https://github.com/tqdm/tqdm) from 4.32.1 to 4.32.2:
  - [Release notes](https://github.com/tqdm/tqdm/releases/tag/v4.32.2)
  - [Commits](tqdm/tqdm@v4.32.1...v4.32.2)

Quickfix in data_prep.ascii_to_xyz with pandas 0.25.0 introducing new json_normalize behaviour (see https://pandas.pydata.org/pandas-docs/version/0.25/whatsnew/v0.25.0.html#json-normalize-with-max-level-param-support, pandas-dev/pandas#23843). Caught unit test failing when trying to do the math Z=ELEVATION-BOTTOM because pandas.Series parsed the dictionary's key (Z) directly into the name as converters.Z, wow just wow...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO JSON read_json, to_json, json_normalize
Projects
None yet
5 participants