Skip to content

read_json tz handling when it's just a list #6864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hayd opened this issue Apr 10, 2014 · 6 comments
Closed

read_json tz handling when it's just a list #6864

hayd opened this issue Apr 10, 2014 · 6 comments
Labels
IO JSON read_json, to_json, json_normalize Timezones Timezone data dtype

Comments

@hayd
Copy link
Contributor

hayd commented Apr 10, 2014

http://stackoverflow.com/questions/22995828/json-extension-file-timestamp-pandas-python

Perhaps two parts here, first date is not converted unless col is passed in this example. Also tz is lost (I thought it should be 0 somewhere e.g. utcoffset).

js = '["2012-03-01T00:05:55+00:00", "2012-03-01T00:06:23+00:00", "2012-03-01T00:06:52+00:00", "2012-03-01T00:11:23+00:00", "2012-03-01T00:12:47+00:00", "2012-03-01T00:12:54+00:00", "2012-03-01T00:16:14+00:00", "2012-03-01T00:17:31+00:00", "2012-03-01T00:21:23+00:00", "2012-03-01T00:21:26+00:00", "2012-03-01T00:22:25+00:00", "2012-03-01T00:28:24+00:00", "2012-03-01T00:31:21+00:00", "2012-03-01T00:32:20+00:00", "2012-03-01T00:33:32+00:00", "2012-03-01T00:35:21+00:00", "2012-03-01T00:38:14+00:00", "2012-03-01T00:39:24+00:00", "2012-03-01T00:43:12+00:00", "2012-03-01T00:46:13+00:00", "2012-03-01T00:46:31+00:00", "2012-03-01T00:48:03+00:00", "2012-03-01T00:49:34+00:00", "2012-03-01T00:49:54+00:00", "2012-03-01T00:55:19+00:00", "2012-03-01T00:56:27+00:00", "2012-03-01T00:56:32+00:00"]'

In [44]: pd.read_json(js).head(2)  # strings !
Out[44]:
                           0
0  2012-03-01T00:05:55+00:00
1  2012-03-01T00:06:23+00:00

In [45]: pd.read_json(js, convert_dates=[0]).head(2)
Out[45]:
                    0
0 2012-03-01 00:05:55
1 2012-03-01 00:06:23

In [46]: pd.read_json(js, convert_dates=[0]).iloc[0, 0]
Out[46]: Timestamp('2012-03-01 00:05:55', tz=None)
@jreback
Copy link
Contributor

jreback commented Apr 10, 2014

I dont' think json currently handles tz at all (normally json is UTC, but in theory could be anything).

@jreback
Copy link
Contributor

jreback commented Apr 10, 2014

cc @Komnomnomnom

@Komnomnomnom
Copy link
Contributor

So at a glance it looks like JSON uses pd.to_datetime, does the timezone issue lie with that?

In [1]: import pandas as pd

In [2]: pd.version.version
Out[2]: '0.13.1-630-gea883e0'

In [3]: pd.Timestamp("2012-03-01T00:05:55+00:00")
Out[3]: Timestamp('2012-03-01 00:05:55+0000', tz='UTC')

In [4]: pd.to_datetime("2012-03-01T00:05:55+00:00")
Out[4]: Timestamp('2012-03-01 00:05:55')

In [5]: pd.DatetimeIndex(["2012-03-01T00:05:55+00:00", "2012-03-01T00:06:23+00:00", "2012-03-01T00:06:52+00:00", "2012-03-01T00:11:23+00:00", "2012-03-01T00:12:47+00:00", "2012-03-01T00:12:54+00:00", "2012-03-01T00:16:14+00:00", "2012-03-01T00:17:31+00:00", "2012-03-01T00:21:23+00:00", "2012-03-01T00:21:26+00:00", "2012-03-01T00:22:25+00:00", "2012-03-01T00:28:24+00:00", "2012-03-01T00:31:21+00:00", "2012-03-01T00:32:20+00:00", "2012-03-01T00:33:32+00:00", "2012-03-01T00:35:21+00:00", "2012-03-01T00:38:14+00:00", "2012-03-01T00:39:24+00:00", "2012-03-01T00:43:12+00:00", "2012-03-01T00:46:13+00:00", "2012-03-01T00:46:31+00:00", "2012-03-01T00:48:03+00:00", "2012-03-01T00:49:34+00:00", "2012-03-01T00:49:54+00:00", "2012-03-01T00:55:19+00:00", "2012-03-01T00:56:27+00:00", "2012-03-01T00:56:32+00:00"])
Out[5]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-03-01 00:05:55+00:00, ..., 2012-03-01 00:56:32+00:00]
Length: 27, Freq: None, Timezone: UTC

In [6]: pd.to_datetime(["2012-03-01T00:05:55+00:00", "2012-03-01T00:06:23+00:00", "2012-03-01T00:06:52+00:00", "2012-03-01T00:11:23+00:00", "2012-03-01T00:12:47+00:00", "2012-03-01T00:12:54+00:00", "2012-03-01T00:16:14+00:00", "2012-03-01T00:17:31+00:00", "2012-03-01T00:21:23+00:00", "2012-03-01T00:21:26+00:00", "2012-03-01T00:22:25+00:00", "2012-03-01T00:28:24+00:00", "2012-03-01T00:31:21+00:00", "2012-03-01T00:32:20+00:00", "2012-03-01T00:33:32+00:00", "2012-03-01T00:35:21+00:00", "2012-03-01T00:38:14+00:00", "2012-03-01T00:39:24+00:00", "2012-03-01T00:43:12+00:00", "2012-03-01T00:46:13+00:00", "2012-03-01T00:46:31+00:00", "2012-03-01T00:48:03+00:00", "2012-03-01T00:49:34+00:00", "2012-03-01T00:49:54+00:00", "2012-03-01T00:55:19+00:00", "2012-03-01T00:56:27+00:00", "2012-03-01T00:56:32+00:00"])
Out[6]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-03-01 00:05:55, ..., 2012-03-01 00:56:32]
Length: 27, Freq: None, Timezone: None

@Komnomnomnom
Copy link
Contributor

Regarding the date detection and conversion this works if targeting a series

In [60]: json.read_json('["2012-03-01T00:05:55+00:00"]', typ='series')
Out[60]: 
0   2012-03-01 00:05:55
dtype: datetime64[ns]

however DataFrame only converts "date-like" columns which it deems to be

                    col.endswith('_at') or
                    col.endswith('_time') or
                    col.lower() == 'modified' or
                    col.lower() == 'date' or
                    col.lower() == 'datetime'):     

Personally I'm inclined to close as not a bug, as it works as intended (although I didn't author this code). The fix here I think is to target a Series, but I'm happy to update the docs to reflect what I've said above.

@hayd
Copy link
Contributor Author

hayd commented Apr 11, 2014

Ah, so one part of this is a to_datetime bug:

In [62]: pd.Timestamp("2012-03-01T00:05:55+00:00")
Out[62]: Timestamp('2012-03-01 00:05:55+0000', tz='UTC')

In [63]: pd.to_datetime(["2012-03-01T00:05:55+00:00"])[0]
Out[63]: Timestamp('2012-03-01 00:05:55', tz=None)

This seems to be discussed in #6415.

I remember this _at and _time stuff, I think it works pretty well most of the time, and you can specify the columns explicitly (i.e. convert_dates=[0]), so agree the JSON part is not at fault! :)

@jreback
Copy link
Contributor

jreback commented Apr 11, 2014

right...let's close this in favor of #6415 (which is an open issue)

@jreback jreback closed this as completed Apr 11, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

3 participants