-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
json_normalize has inconsistent behaviors while flattening nested array elements #21537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@gfyoung I'm interested in contributing here. It seems like it's related to this TODO in json_normalize: pandas/pandas/io/json/normalize.py Lines 204 to 206 in edb71fd
Generally, I tend to have issues with lists of records within a key, which I believe is related to this behavior. For example:
Problem:
Expected:
This issue seems most closely aligned here, but I can open another one. I'd also be a new contributor, so not sure if I'm throwing myself too far in the deep end here. Any thoughts/suggestions? |
I think #23861 solves this already just need to update PR |
Interesting, it does seem to solve @vuminhle 's issue, where the value is a list. However, if the value is a dictionary, this part: pandas/pandas/io/json/normalize.py Lines 287 to 290 in fca2a27
Seems to be the reason it returns just the keys. Since iterating through a dictionary gets the keys, the output is just the keys, rather than the dictionary items themselves. Any thoughts on this? Should I add this as a comment to the PR? |
@bpben : Feel free to comment on the PR. However, it seems like that PR will go stale, so at some point (probably within a week), feel free to take the changes from that PR to create your own. |
This looks fixed on master. Could use a test to close:
|
Code Sample, a copy-pastable example if possible
Problem description
The above code produces:
Looks like
json_normalize
recursively flattens only the top-level array (in the first call).In the second call, it only flattens to the first level. I think it should have the same behavior as that in the first call and produces the same output.
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.1
pytest: 3.6.1
pip: 10.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.14.2
scipy: None
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: