Skip to content

BUG: Constructing Timestamp from local time + offset + time zone yields wrong offset #7833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ischwabacher opened this issue Jul 24, 2014 · 8 comments · Fixed by #7907
Closed
Labels
Bug Timezones Timezone data dtype
Milestone

Comments

@ischwabacher
Copy link
Contributor

xref #7825

In [1]: import pandas as pd

In [2]: pd.Timestamp('2013-11-01 00:00:00-0500', tz='America/Chicago')
Out[2]: Timestamp('2013-10-31 23:09:00-0551', tz='America/Chicago')

This bug is inherited from datetime, which is using local mean time (the first entry for America/Chicago in tzinfo).

@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

this is actually a bit tricky, because you are passing an already localized tz, then what is the tz='....' doing? a conversion? (has to be, right?) or is this reallying localizing the UTC of the original value?

@jreback jreback added this to the 0.15.0 milestone Jul 24, 2014
@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

cc @rockg
cc @sinhrks
cc @adamgreenhall

@ischwabacher
Copy link
Contributor Author

But -0500 isn't a time zone-- it's just an offset. It tells you how to convert this time to UTC, but not this time plus an hour. How do we want to handle Timestamp('2013-11-3 01:30:00', tz='America/Chicago')? That time is ambiguous, but if we add -0500 or -0600, it's not. We also can't rely on time zone abbreviations-- Timestamp('2013-11-3 01:30:00 CDT') could be in America/Havana, and even Timestamp('2013-11-3 01:30:00 CDT-0500') could be in one of the Indiana time zones. To make matters worse, not all local time discontinuities come from DST.

I really think the only unambiguous way to construct an aware datetime from local time is if both the time zone and the offset are also present. Offset lets you convert to UTC, where you look up the correct offset for the time zone at that UTC time and compare it to the offset again. If they match, great. If they don't match, you can either correct the offset or give a NonExistentTime error, and which of these you do should probably be determined by a strict= kwarg. Maybe you accept only the correct offset, or Z to construct from UTC + time zone.

EDIT: Another benefit of fixing this:

In [1]: from pandas import Timestamp

In [2]: Timestamp('2013-11-3', tz='America/Chicago')
Out[2]: Timestamp('2013-11-03 00:00:00-0500', tz='America/Chicago')

In [3]: eval(repr(_))
Out[3]: Timestamp('2013-11-02 23:09:00-0551', tz='America/Chicago')

@jreback jreback added the Bug label Jul 24, 2014
@sinhrks
Copy link
Member

sinhrks commented Jul 24, 2014

It looks to be caused by tslib._localize_tso (in Timestamp.__new__) ignores current tz. May better to replace _localize_tso to tz_convert? And I think -05:00 is timezone with fixed offset, by definition.

result = pd.Timestamp('2013-11-01 00:00:00-05:00')
result, result.tz
# Timestamp('2013-11-01 00:00:00-0500'), tzoffset(None, -18000)

result.tz_convert('America/Chicago')
# 2013-11-01 00:00:00-05:00
result.tz_convert('America/Chicago')

@ischwabacher ischwabacher changed the title BUG: Constructing Timestamp from local time + offset + time zone yields wrong offset BUG: Constructing Timestamp from local time + offset + time zone yields wrong offset Jul 24, 2014
@jreback
Copy link
Contributor

jreback commented Jul 24, 2014

@sinhrks yep

yes I think if the current tz is not None then tz_convert it. But I might only do it if its a fixed offset (or a UTC). Not sure that you want implicit conversion just by passing a tz, so maybe raise ValueError if that is the case (and obviously its innocuous if you pass in the SAME tz as inferred from the stamp).

@ischwabacher
Copy link
Contributor Author

It might be best if the conversion only happens when ts_input is a string object, isinstance(ts_input.tzinfo, (dateutil.tz.tzoffset, dateutil.tz.tzutc)), and tz is not None. This test is awkward, but it lets us make sure we're only doing as much in the constructor as we need to.

Then there's this problem:

In [20]: dateutil.parser.parse('2013-11-3 00:00:00 CDT').tzinfo
Out[20]: tzlocal()   # local TZ is America/Chicago

In [21]: dateutil.parser.parse('2013-11-3 00:00:00 EDT').tzinfo
# None

So I'm not sure we even can do the right thing in all cases without messing with dateutil.parser.

Also, tz_convert is a a Timestamp method. What are the consequences of calling methods on an object while it is still in __new__? Are there any pitfalls we should be watching out for?

EDIT:

And I think -05:00 is timezone with fixed offset, by definition.

I agree that this is how it's defined, but I think it's better to think about what information we need about a time in order to do what the user wants. Ultimately, I think the user (and here I'm generalizing from a sample size of 'me') wants to be able to relate times in a particular place. Fixed offset time zones are a tool for doing this (and I use them because I collect data using time-zone-naive devices for which the offset is only guaranteed to be correct at the start of the collection window), but in general I suspect that most of the time that a user gets a Timestamp with a fixed offset, the user is either going to 1) convert it to UTC, 2) convert it to the real time zone it was supposed to be in in the first place, or 3) write buggy DST-handling code. Not every user can be saved from case 3), but I think we should make case 2) as attractive as possible.

@jreback
Copy link
Contributor

jreback commented Jul 25, 2014

@ischwabacher, @sinhrks is talking about a much lower-level object, called a TSObject.

dateutil is not used here, its parsed by the internal c-routines (as this is a standard format)

@ischwabacher
Copy link
Contributor Author

Timestamp.__new__ calls dateutil.parser.parse (for which parse_date is an alias) on its first argument if it's a string. So in the string input case, the first argument to the low-level convert_to_tsobject call is a datetime.datetime with a dateutil.tz time zone. It will never be a string object.

EDIT: But I did fail to notice that there's a tz_convert that takes an ndarray[int64_t] as its first argument. Is that what you were referring to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants