-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Creating Series from DatetimeIndex w/ tz loses tz info #6032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am not sure what is correct here; using
So I think this is correct @cancan101 makes sense? |
I think @cancan101 means that the timezone is lost in the value in the Series, not in the index. So:
And I think this is by design (or by 'limitation of numpy'). When the datetime values are in a column or series, and not in the index, it are just
And numpy has no real timezone support. You can see that the value is correct (I mean, the offset is of the timezone is applied), but numpy prints it in local time. |
The problem actually seems to be much simpler (and maybe easier to solve). I would think that both of these statements should return the same thing but they do not:
|
@jorisvandenbossche well it could be preserved, but then it would automatically be of |
ok...so its the |
@cancan101 Ah, so you were talking about the index. Misunderstood |
Well not totally. Maybe it would also be good to have an option to pass to |
Something like:
|
But it should be?
EDIT: no, you are right. This seems documented behaviour. So |
Not really sure about that. IMO, either way there seems to be some consistency issues given that |
should both index & values be the same? (for a DatetimeIndex) |
The issue for me wasn't actually an explicit call to the Series constructor. Rather what I did was something like this:
in which case the timezone info gets silently stripped from the new column I have added to the DataFrame. |
So this is prob a bug then a conversion of a daatimeiex that has a timezone is getting coerced to datetime64[ns]], but instead should remain as object
|
PR is #6398
|
I personally don't see this as a bug, more an oddity which you should be warned for, so don't like the change. I would expect that my DatetimeIndex is converted to a datetime64 column, as are arrays with datetime.datetime objects, and would be surprised if this is not the case (but maybe I am just brainwashed by the way it is now .. ). |
Actually, arrays with datetime.datetime objects that have a time zone are also converted to object ... |
It looks like the underlying numpy array should support timezones: http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html |
@cancan101 numpy tz is not used by pandas at all, buggy < 1.8 |
I think the 'ideal' solution in the long term would be that all datetime-like columns/index (with and without timezones) would be converted to datetime64 (when timezone support in numpy has improved), but the question is what to do now:
|
@jorisvandenbossche to your last point (this was already the case)
|
@jorisvandenbossche I don't think this is really a problem more of an inconcistency of that assigning a list of datetimes with tz's worked, but assigning an DatetimeIndex always forced conversion. Now they both do the same thing. I think if a user has a tz, they should keep it (I agree maybe a warning could be in order though) thoughts on that? |
I am not sure their are a lot of tests for this kind of thing. very little breakage when I made this change. It seems reasonable (as I think it was slightly inconsistent before). The fundamental issue is that ATM a Series/column of a Frame must be object dtype if it has a time zone. In theory it could be or maybe push the timezones down to numpy (if >= 1.8) |
all of the issues here have been fixed in pandas long ago (esp the dst transition issue, well most of those issues) http://numpy-discussion.10968.n7.nabble.com/timezones-and-datetime64-td33407.html |
At some point the tz can also be stored in numpy once users are on new |
@jreback Indeed, I agree this is indeed consistent with datetime.datetime objects. Still something: With your PR merged, you get this (with the examples of above):
Strange? You don't get an array but an Index object. When directly using an array of Timestamps with timezone, it works:
|
hmm....that is wrong |
@jorisvandenbossche ok...fixed in #6419 thxs |
@jreback super! |
@jorisvandenbossche fixed |
The original
DatetimeIndex
has thetz
:But when converted to a
Series
, thetz
info is lost:If this is by design, then a warning would be nice.
This issue came up when I tried to add a list of timezone aware dates to a DataFrame.
The text was updated successfully, but these errors were encountered: