Skip to content

BUG: json_normalize fails with empty arrays/lists #47182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
mateuszboryn opened this issue May 31, 2022 · 2 comments
Closed
2 of 3 tasks

BUG: json_normalize fails with empty arrays/lists #47182

mateuszboryn opened this issue May 31, 2022 · 2 comments
Labels
Bug IO JSON read_json, to_json, json_normalize Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.).

Comments

@mateuszboryn
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# This works fine:
data = [ {
        "id": 1, "name": "Cole Volk", "fitness": [{"height": 130, "weight": 60}],
        "someList": []
    },
    {
        "id": 2, "name": "Faye Raker", "fitness": [{"height": 130, "weight": 60}],
        "someList": [1]
    },]

pd.json_normalize(data, "fitness", ["id", "name", "someList"])

# This doesn't work
data = [ {
        "id": 1, "name": "Cole Volk", "fitness": [{"height": 130, "weight": 60}],
        "someList": []
    },
    {
        "id": 2, "name": "Faye Raker", "fitness": [{"height": 130, "weight": 60}],
        "someList": []
    },]

pd.json_normalize(data, "fitness", ["id", "name", "someList"])

# fails with ValueError: operands could not be broadcast together with shape (0,) (2,)

Issue Description

when normalizing nested lists, which are empty in all the objects, json_normalize fails with "ValueError: operands could not be broadcast together with shape".

It may be related to the behavior of np.repeat(), but I report the issue to pandas, because it looks like an overlooked corner case:

v = [ [], [], [234] ]
lengths = [1,1,3]
np.array(v, dtype=object).repeat(lengths)

and

v = [ [], [], [] ]
lengths = [1,1,3]
np.array(v, dtype=object).repeat(lengths)

Expected Behavior

Return correct data structure and not fail.

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.10.2.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8

@mateuszboryn mateuszboryn added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 31, 2022
@simonjayhawkins
Copy link
Member

Thanks @mateuszboryn for the report.

This is the same underlying issue as #37782

i've posted a possible fix there #37782 (comment) that also fixes this issue.

PRs welcome.

@simonjayhawkins simonjayhawkins added IO JSON read_json, to_json, json_normalize and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 3, 2022
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jun 3, 2022
@simonjayhawkins simonjayhawkins added the Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). label Jun 3, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@topper-123
Copy link
Contributor

This works as expected now.

>>> data = [ {
...        "id": 1, "name": "Cole Volk", "fitness": [{"height": 130, "weight": 60}],
...        "someList": []
...    },
...    {
...        "id": 2, "name": "Faye Raker", "fitness": [{"height": 130, "weight": 60}],
...        "someList": []
...    },]
>>> pd.json_normalize(data, record_path="fitness", meta=["id", "name", "someList"])
   height  weight id        name someList
0     130      60  1   Cole Volk       []
1     130      60  2  Faye Raker       []

Closing as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.).
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants