Skip to content

to_json() not including all attributes #33043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jpsolanet opened this issue Mar 26, 2020 · 4 comments
Closed

to_json() not including all attributes #33043

jpsolanet opened this issue Mar 26, 2020 · 4 comments
Labels
IO JSON read_json, to_json, json_normalize Needs Info Clarification about behavior needed to assess issue
Milestone

Comments

@jpsolanet
Copy link

jpsolanet commented Mar 26, 2020

edit: simplified example

Code Sample

class MyClass:
    def __init__(self,value):
        self.value=value
        self.one=value+1
        self.two=value+2
        self.three=value+3
        
    def __repr__(self):
        return str(self.value)
    
# Sample 
x = MyClass(1)
x.__dict__
# Output
# {'value': 1, 'one': 2, 'two': 3, 'three': 4}

# Sample frame
values = [MyClass(value=x) for x in range(3)]
df = pd.DataFrame(values,columns=['amount'])

df.to_json()

Problem description

I have a custom class i'm using in dataframes, a simplified version of which is shown above.

The output to_json() does not provide the full list of attributes, which I'd expected. Adding additional attributes to the class, some of them are included, but not all.

Current output

{"amount": {
    "0": {
        "one": 1,
        "two": 2
    },
    "1": {
        "one": 2,
        "two": 3
    },
    "2": {
        "one": 3,
        "two": 4
    }
}}

Expected Output

{"amount":{
    "0": {
        "value": 0,
        "one": 1,
        "two": 2,
        "three": 3
    },
    "1": {
        "value": 1,
        "one": 2,
        "two": 3,
        "three": 4
    },
    "2": {
        "value": 2,
        "one": 3,
        "two": 4,
        "three": 5
    }
}

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.3.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0
Cython : None
pytest : 5.3.5
hypothesis : None
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.10.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.11
tables : 3.6.1
tabulate : 0.8.6
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : None
numba : None

@TomAugspurger
Copy link
Contributor

I'm surprised to see this works at all. I'm not sure what's going on, but I suspect this is not well tested / supported.

You can provide a default_handler though

In [19]: df.to_json(default_handler=lambda x: x.__dict__)
Out[19]: '{"amount":{"0":{"value":0,"one":1,"two":2,"three":3},"1":{"value":1,"one":2,"two":3,"three":4},"2":{"value":2,"one":3,"two":4,"three":5}}}'

You're welcome to dig further into what's going on.

@TomAugspurger TomAugspurger added IO JSON read_json, to_json, json_normalize Needs Info Clarification about behavior needed to assess issue labels Mar 26, 2020
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Mar 26, 2020
@jpsolanet
Copy link
Author

Thanks you for this, the default_handler workaround will work for me for now.

Will take a dive into what's happening, and try to update docs and/or tests if nothing else.

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2020

Pandas doesn’t follow the conventions of the stdlib strictly as we use a custom serializer forked from ujson. If you wanted to define a class method for serialization, it should be toDict:

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#fallback-behavior

@jpsolanet
Copy link
Author

I'll go the class method route, thank you Will & Tom.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

4 participants