Skip to content

BUG: Pandas 1.0.x read_json unable to convert lists with date values #33787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
reneeburton opened this issue Apr 25, 2020 · 2 comments · Fixed by #42894
Closed

BUG: Pandas 1.0.x read_json unable to convert lists with date values #33787

reneeburton opened this issue Apr 25, 2020 · 2 comments · Fixed by #42894
Assignees
Labels
good first issue IO JSON read_json, to_json, json_normalize Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@reneeburton
Copy link

reneeburton commented Apr 25, 2020

  • [ X] I have checked that this issue has not already been reported.

  • [ X] I have confirmed this bug exists on the latest version of pandas.

  • [ X] (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

# this works fine in 0.25.x and 1.0.x 
json_line= pd.DataFrame([([1, 2],  "hector")], columns=['accounts',  'name']).to_json(lines=True, orient='records')
data = pd.read_json(json_line)

# this works fine in 0.25.x and 1.0.x --- convert_dates = False (non-default behavior)
json_line= pd.DataFrame([([1, 2], ['2020-03-05', '2020-04-08T09:58:49+00:00'], "hector")], columns=['accounts', 'date', 'name']).to_json(lines=True, orient='records')
json_line
data = pd.read_json(json_line, convert_dates=False)

# this does not error in 0.25.x but errors in 1.0.x  -- error list is unhashable 
json_line= pd.DataFrame([([1, 2], ['2020-03-05', '2020-04-08T09:58:49+00:00'], "hector")], columns=['accounts', 'date', 'name']).to_json(lines=True, orient='records')
json_line
data = pd.read_json(json_line) 

Problem description

In pandas 0.25.x (and below), pandas read_json(path, lines=True) was able to read newline-delimited json files with blobs with entries that contained lists of all types. In Pandas 1.0.x, this same data causes a unhashable object error. The error appears to be due to lists that contain date-like objects that are converted by default with convert_dates=True.

In pandas 0.25.x, it handles the lists of dates, but does not convert those items. they are kept as strings.

Expected Output

pandas dataframe containing lists without error.

df.to_json()

returns

'{"accounts":{"0":[1,2]},"event_time":{"0":["2020-04-08T09:50:49+00:00","2020-04-08T09:58:49+00:00"]},"name":{"0":"hector"}}'

Output of 1.0.3

Using pd.read_json(json_line, lines=True)
Errors vary slightly depending on context:

  • TypeError: unhashable type: 'list'
  • TypeError: <class 'list'> is not convertible to datetime. in the small example above

using pd.read_json(json_line, lines=True, convert_dates=False) returns expected output consistent with pandas 0.25.x.

[paste the output of 0.25.3 here leaving a blank line after the details tag]

'{"accounts":{"0":[1,2]},"event_time":{"0":["2020-04-08T09:50:49+00:00","2020-04-08T09:58:49+00:00"]},"name":{"0":"hector"}}'
@reneeburton reneeburton added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 25, 2020
@jbrockmendel jbrockmendel added IO JSON read_json, to_json, json_normalize and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2020
@mroeschke
Copy link
Member

This looks fixed on master. Could use a test

In [6]: json_line= pd.DataFrame([([1, 2], ['2020-03-05', '2020-04-08T09:58:49+00
   ...: :00'], "hector")], columns=['accounts', 'date', 'name']).to_json(lines=T
   ...: rue, orient='records')
   ...: json_line
   ...: data = pd.read_json(json_line)

In [7]: data
Out[7]:
   accounts                       date    name
0         1                 2020-03-05  hector
1         2  2020-04-08T09:58:49+00:00  hector

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Jul 31, 2021
@horaceklai
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue IO JSON read_json, to_json, json_normalize Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants