-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: unexpected behavior of json_normalize meta arg #34465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Definitely looks buggy. The code for json_normalize is here if something you are interested in debugging: pandas/pandas/io/json/_normalize.py Line 112 in 1cad9e5
|
@WillAyd I had an opportunity to look at the json_normalize code. It seems like it currently only looks for metadata within the record path. Since both To showcase this behavior, I'll show what happens if I change the input data to have the key test_input = [{
"injection": {
"time": "injection time"
},
"results": [
{
"other_time": "result time",
"peaks": [{
"name": "peak name A",
"area": 1
}, {
"name": "peak name B",
"area": 2
}]
},
]
}]
df = pd.json_normalize(test_input, record_path=['results', "peaks"], meta=[["injection", "time"]])
print(df) Ouput:
The logic to implement the behavior I expected is probably pretty tricky, not a simple debug. For now, it might be enough to document this behavior, and throw an error in this edge case I found (i.e. make sure that we don't accidentally grab the wrong metadata because there's a shared key between different branches of the json). |
+1, faced this unexpected behavior today. |
Encountered this today. Interesting to see that this still exists in version 1.4.1. This stems by the way from the same problem as #40514 (one might call them duplicates): that |
@tsteffek this is an open issue, so not yet fixed. pandas is a community project. Anyone is welcome to submit a patch or contribute in other ways such as further investigation or suggesting fixes as you have here. Thanks. |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
I expected the string
"injection time"
to populate the$.injection.time
column. Instead, the contents of$.results.time
is populating this column. If I specifymeta=["injection"]
, then{"time": "injection time"}
correctly populates the column.Expected Output
Output of
pd.show_versions()
/usr/local/lib/python3.6/dist-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi.
""")
/usr/local/lib/python3.6/dist-packages/pandas_datareader/compat/init.py:7: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
from pandas.util.testing import assert_frame_equal
INSTALLED VERSIONS
commit : None
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.104+
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.4
numpy : 1.18.4
pytz : 2018.9
dateutil : 2.8.1
pip : 19.3.1
setuptools : 46.4.0
Cython : 0.29.18
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 5.5.0
pandas_datareader: 0.8.1
bs4 : 4.6.3
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.2.6
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
pytest : 3.6.4
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.17
tables : 3.4.4
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.1.0
xlwt : 1.3.0
xlsxwriter : None
numba : 0.48.0
The text was updated successfully, but these errors were encountered: