Skip to content

BUG: Attributes skipped when serialising plain Python objects to JSON (#42768) #42931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 11, 2021

Conversation

joelgibson
Copy link
Contributor

@joelgibson joelgibson commented Aug 8, 2021

This patch closes #42768 and fixes a previously-reported issue #33043.

When ujson falls back to encoding a Python object, it iterates over all of the non-callable attributes of the object which do not start with an underscore, using them as JSON keys. While iterating the dirs(obj) list, an index was incremented twice causing every second attribute to be skipped.

I've fixed this increment, and moved some of the declarations inside the function so that control flow is more clear. I've also added a regression test for this behaviour.

This patch also xref #41174, but I'm not sure that turning complex numbers into {"real": ..., "imag": ...} JSON objects was intended in the first place, so I haven't added a test for this.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this have any user visible effects in to_json itself?

@jreback jreback added the IO JSON read_json, to_json, json_normalize label Aug 8, 2021
@jreback
Copy link
Contributor

jreback commented Aug 8, 2021

can you replicate the test in the OP

@joelgibson
Copy link
Contributor Author

joelgibson commented Aug 8, 2021

@jreback The test more or less duplicates the OP, just using regular classes rather than dataclasses. However, the expected output in the OP included properties beginning with underscores, wheras I have left these out of the resulting JSON because this is what the existing (intended, I assume) behaviour did.

It does have user-visible effects, for example if classes are stored as elements in series or dataframes. The program:

import pandas as pd

class A:
    def __init__(self, a, b):
        self.a = a
        self.b = b

series = pd.Series([A(a=1, b=2)])
print(series.to_json())

used to return {"0":{"a":1}}, and now returns {"0":{"a":1,"b":2}}.

@joelgibson joelgibson requested a review from jreback August 8, 2021 07:13
Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pr @joelgibson! While I agree that the changes you made make the code more readable, would it possible to leave those for a followup pr? Just to make the changes that actually affect code logic clearer

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some test comments, ping on green.

@jreback jreback added this to the 1.4 milestone Aug 8, 2021
@joelgibson
Copy link
Contributor Author

Thanks @mzeitlin11 - I would usually only change the part of the function which affected the logic, but the logic of that function was so unclear it seemed to warrant a cleanup anyway. For example on 928 it looks like something interesting is being assigned but actually NULL is being assigned. (Incidentally, how should I go about putting in the cleanup PR - just title it with "CLN" and reference this discussion?).

@jreback ping

Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @joelgibson! Followup with your original cleanups would be great!

(Incidentally, how should I go about putting in the cleanup PR - just title it with "CLN" and reference this discussion?)

Yep sounds good, nothing special needs to be done

joelgibson added a commit to joelgibson/pandas that referenced this pull request Aug 10, 2021
@jreback jreback merged commit 88a43d8 into pandas-dev:master Aug 11, 2021
@jreback
Copy link
Contributor

jreback commented Aug 11, 2021

thanks @joelgibson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: to_json() swallows attributes of dataclasses
3 participants