Skip to content

BUG: Timestamp(Timestamp(Ambiguous time)) modifies .value with dateutil tz #24329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mroeschke opened this issue Dec 18, 2018 · 5 comments · Fixed by #30995
Closed

BUG: Timestamp(Timestamp(Ambiguous time)) modifies .value with dateutil tz #24329

mroeschke opened this issue Dec 18, 2018 · 5 comments · Fixed by #30995
Labels
Bug Datetime Datetime data dtype Timezones Timezone data dtype
Milestone

Comments

@mroeschke
Copy link
Member

Pretty obscure bug, but this seems fishy:

In [7]: pd.__version__
Out[7]: '0.24.0.dev0+1300.ge0a68076a.dirty'

# Ambiguous time
In [8]: t = pd.Timestamp(1382835600000000000, tz='dateutil/Europe/London')

# Repr is consistent
In [11]: t
Out[11]: Timestamp('2013-10-27 01:00:00+0100', tz='dateutil//usr/share/zoneinfo/Europe/London')

In [12]: pd.Timestamp(t)
Out[12]: Timestamp('2013-10-27 01:00:00+0100', tz='dateutil//usr/share/zoneinfo/Europe/London')

# .value changes
In [13]: t.value
Out[13]: 1382835600000000000

In [14]: pd.Timestamp(t).value
Out[14]: 1382832000000000000

pytz timezones behave consistently though

In [15]: t = pd.Timestamp(1382835600000000000, tz='Europe/London')

In [16]: t
Out[16]: Timestamp('2013-10-27 01:00:00+0000', tz='Europe/London')

In [17]: pd.Timestamp(t)
Out[17]: Timestamp('2013-10-27 01:00:00+0000', tz='Europe/London')

In [18]: t.value
Out[18]: 1382835600000000000

In [19]: pd.Timestamp(t).value
Out[19]: 1382835600000000000

The fact that the repr between dateutil timezones and pytz timezones don't match can be possible be seen in a change in dateutil somewhere around 2.6? But the main issue that is .value changes.

if LooseVersion(dateutil.__version__) < LooseVersion('2.6.0'):
# see GH#14621
assert times[-1] == Timestamp('2013-10-27 01:00:00+0000',
tz=tz, freq="H")
elif LooseVersion(dateutil.__version__) > LooseVersion('2.6.0'):
# fixed ambiguous behavior
assert times[-1] == Timestamp('2013-10-27 01:00:00+0100',
tz=tz, freq="H")

@mroeschke mroeschke added Bug Datetime Datetime data dtype Timezones Timezone data dtype labels Dec 18, 2018
@mroeschke
Copy link
Member Author

Similar behavior occurs with Nonexistent times that occur right at the cusp of the transition point:

In [11]: pd.__version__
Out[11]: '0.25.0.dev0+818.g06c2a7163.dirty'

In [12]: pd.Timestamp(1552211999999999999, tz='UTC').tz_convert('dateutil/US/Pacific')
Out[12]: Timestamp('2019-03-10 01:59:59.999999999-0700', tz='dateutil//usr/share/zoneinfo/US/Pacific')

In [13]: pd.Timestamp(pd.Timestamp(1552211999999999999, tz='UTC').tz_convert('dateutil/US/Pacific'))
Out[13]: Timestamp('2019-03-10 01:59:59.999999999-0800', tz='dateutil//usr/share/zoneinfo/US/Pacific')

@mroeschke
Copy link
Member Author

My hypothesis is that there's a lower level assumption in npy_datetimestruct_to_datetime that is causing this to fail. convert_datetime_to_tsobject applies an 8 hour shift from local to UTC (where npy_datetimestruct_to_datetime is called) first then a 7 hour shift from UTC to local.

@AlexKirko
Copy link
Member

@mroeschke Thanks for linking this. The problem is that if TimeStamp.value doesn't change (for example if we introduce a shortcut that just returns the object when TimeStamp constructor is called on TimeStamp), this breaks some functionality (date_range, to be specific).
I'll play with the code a bit and see if I can fix the behavior.

@AlexKirko
Copy link
Member

AlexKirko commented Jan 9, 2020

Some notes. When we make a Timestamp out of an int, ts.value is always set to that value, because convert_to_tsobject simply scales the value according to the unit and never shifts it. When we later call the Timestamp constructor, it casts the Timestamp to datetime and then updates ts.value based on the results of that cast. So, after the first call, ts.value and object can have different DST settings, and the difference is resolved with a repeated cast.
At first glance, it appears that Timestamp.tz_localize assumes non-DST time when localizing, but when we call the constructor on a Timestamp object, the implementation assumes the ambiguous time as DST and changes Timestamp.value. This seems to happen because of the cast to datetime, later operations and cast back to Timestamp.
I will research to confirm.

@AlexKirko
Copy link
Member

AlexKirko commented Jan 13, 2020

More notes. When we call a Timestamp constructor on an int with a dateutil timezone, it calls convert_to_ts_object in conversion, which calls localize_tso, which calls get_dst_info in timezones, which calls get_utc_trans_times_from_dateutil_tz, which accesses zip(tz._trans_list, tz._trans_idx) to determine DST shifts.
So, when we call the constructor, it assumes DST=True.
When we call it the second time, we go to convert_datetime_to_tsobject, which calls pydatetime_to_dt64 in np_datetime.pyx, which calls PyDateTime_GET_YEAR to get all the date attributes, and then calls npy_datetimestruct_to_datetime, which does not care about the timezone anymore (it assumes DST=False). This means that when we get the epoch time this way, we ignore whether the time is DST or not. Then we try to fix the result with get_utcoffset:

        offset = get_utcoffset(obj.tzinfo, ts)
        obj.value -= int(offset.total_seconds() * 1e9)

This shifts time an hour back, because get_utcoffset is DST-aware.
I will attempt to fix this by taking into account the DST offset in convert_datetime_to_tsobject.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants