Skip to content

BUG: Type mismatch in read_json #35464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rohan-gt opened this issue Jul 29, 2020 · 2 comments
Open

BUG: Type mismatch in read_json #35464

rohan-gt opened this issue Jul 29, 2020 · 2 comments
Labels
Bug IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@rohan-gt
Copy link

Tested on Pandas v1.1.0

I'm trying to import a JSON file containing the following:

test.json

{"file_path": null}

If I run the following code,

pd.read_json("test.json", typ='series').to_dict()

I get this output:

{'file_path': NaT}

Why is null getting converted to NaT instead of None?

@rohan-gt rohan-gt added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 29, 2020
@simonjayhawkins
Copy link
Member

Thanks @rohan-gt for the report. #28501 maybe related

a reproducible code sample

>>> pd.__version__
'0.25.3'
>>>
>>> pd.read_json('{"file_path": null}', typ="series").to_dict()
{'file_path': NaT}
>>>

further investigation and PRs welcome.

@simonjayhawkins simonjayhawkins added IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 30, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jul 30, 2020
@danieldjewell
Copy link

I was just running into a semi-related issue... I played around a bit. Weird results:

>>> pd.__version__

'1.1.0'

>>> test_json1='{"file_path": null}'                                                                 
>>> pd.read_json(test_json1, typ='series').to_dict()

{'file_path': NaT}

>>> test_json2='{"file_path": null, "file_size": 0}'                                 
>>> pd.read_json(test_json2, typ='series').to_dict()

{'file_path': nan, 'file_size': 0.0}           
          
>>> test_json3='{"file_path": null, "file_size": null}'                                                                                                    
>>> pd.read_json(test_json3, typ='series').to_dict()

{'file_path': NaT, 'file_size': NaT}         
                                                        
>>> test_json4='{"file_path": null, "file_size": ""}'                                                
>>> pd.read_json(test_json4, typ='series').to_dict()

{'file_path': NaT, 'file_size': NaT}       
                                                          
>>> test_json5='{"file_path": null, "file_size": "test"}'                                            
>>> pd.read_json(test_json5, typ='series').to_dict()

{'file_path': None, 'file_size': 'test'}

>>> test_json6='{"file_path": null, "file_size": 0, "file_meta": "hello"}'
>>> pd.read_json(test_json6, typ='series').to_dict()

{'file_path': None, 'file_size': 0, 'file_meta': 'hello'}

#-------------------------

# test_json1='{"file_path": null}'
>>> pd.read_json(test_json1, typ='series')

file_path   NaT
dtype: datetime64[ns]

# test_json2='{"file_path": null, "file_size": 0}'
>>> pd.read_json(test_json2, typ='series')

file_path    NaN
file_size    0.0
dtype: float64

# test_json4='{"file_path": null, "file_size": ""}'
>>> pd.read_json(test_json4, typ='series')

file_path   NaT
file_size   NaT
dtype: datetime64[ns]

# test_json6='{"file_path": null, "file_size": 0, "file_meta": "hello"}'
>>> pd.read_json(test_json6, typ='series')

file_path     None
file_size        0
file_meta    hello
dtype: object

So it looks like 3 different scenarios:

  1. If only nulls in the input JSON (or an empty string), output is NaT and the datatype is datetime64[ns]
  2. If a numeric value is in the input (and zero strings), the null is converted to NaN with a dtype of float64
  3. If any string is in the input, the null is converted to None with a dtype of object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

4 participants