Skip to content

BUG: Timestamp.replace chaining not compat with datetime.replace #15683 #16110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Highlights include:
- Support for S3 handling now uses ``s3fs``, see :ref:`here <whatsnew_0200.api_breaking.s3>`
- Google BigQuery support now uses the ``pandas-gbq`` library, see :ref:`here <whatsnew_0200.api_breaking.gbq>`
- Switched the test framework to use `pytest <http://doc.pytest.org/en/latest>`__ (:issue:`13097`)
- Fixed an issue with ``Timestamp.replace(tzinfo)`` around DST changes (:issue:`15683`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this down to the Bug Fixes section (the Conversion subsection).


.. warning::

Expand Down
18 changes: 9 additions & 9 deletions pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -685,14 +685,16 @@ class Timestamp(_Timestamp):
cdef:
pandas_datetimestruct dts
int64_t value
object _tzinfo, result, k, v
object _tzinfo, result, k, v, ts_input
_TSObject ts

# set to naive if needed
_tzinfo = self.tzinfo
value = self.value
if _tzinfo is not None:
value = tz_convert_single(value, 'UTC', _tzinfo)
value_tz = tz_convert_single(value, _tzinfo, 'UTC')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't make any sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the definition of tz_convert_single to cpdef int64_t and here make value_tz and offset int64_t definitions

offset = value - value_tz
value += offset

# setup components
pandas_datetime_to_datetimestruct(value, PANDAS_FR_ns, &dts)
Expand Down Expand Up @@ -726,16 +728,14 @@ class Timestamp(_Timestamp):
_tzinfo = tzinfo

# reconstruct & check bounds
value = pandas_datetimestruct_to_datetime(PANDAS_FR_ns, &dts)
ts_input = datetime(dts.year, dts.month, dts.day, dts.hour, dts.min,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly are you doing here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you changed the tzinfo from old_tz to new_tz in the old implementation, the code would first do a conversion "from UTC to old_tz" and later "from new_tz to UTC" in order to calculate a correct offset for the value.

However, this was the reason for the problems I discussed in the issue (e.g., off-by-1h-offsets were calculated and invalid errors "date does not exist in timezone" were raised when converting "from new_tz to UTC"). These issues only occured around DST changes when the new offset resulted in a DST switch, so they were not covered before.

What I now did was simply constructing a new Timestamp instance and let the Pandas internals do the correct calculations for value. Unfortunately, the low-level (C-)stuff is not very well documented so my solution for creating a correct result Timestamp might not be the most optimal one.

However, correct code is better than optimal code, so I hope this PR is good enough for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls run the asv suite and see if this regresses anything. perf is a big deal and this goes thru a few more conversions; this is why I am harping on the impl.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a fast low-level function that takes a year, month, day, hour, minute, second, nano and a tzinfo and calculates value so that I can avoid creating a datetime object? Something like convert_to_tsobject() but with a different signature? Or can I call this function in a different way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can trace what these are doing and just replicate it. there is some overhead to creating it in this way.

dts.sec, dts.us, tzinfo=_tzinfo)
ts = convert_to_tsobject(ts_input, _tzinfo, None, 0, 0)
value = ts.value + (dts.ps // 1000)
if value != NPY_NAT:
_check_dts_bounds(&dts)

# set tz if needed
if _tzinfo is not None:
value = tz_convert_single(value, _tzinfo, 'UTC')

result = create_timestamp_from_ts(value, dts, _tzinfo, self.freq)
return result
return create_timestamp_from_ts(value, dts, _tzinfo, self.freq)

def isoformat(self, sep='T'):
base = super(_Timestamp, self).isoformat(sep=sep)
Expand Down
19 changes: 19 additions & 0 deletions pandas/tests/tseries/test_timezones.py
Original file line number Diff line number Diff line change
Expand Up @@ -1280,6 +1280,25 @@ def test_ambiguous_compat(self):
self.assertEqual(result_pytz.to_pydatetime().tzname(),
result_dateutil.to_pydatetime().tzname())

def test_tzreplace_issue_15683(self):
"""Regression test for issue 15683."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we just leave a comment on the first line of the function like # GH 15683

dt = datetime(2016, 3, 27, 1)
tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this need testing with multiple tz's that hit DST


result_dt = dt.replace(tzinfo=tzinfo)
result_pd = Timestamp(dt).replace(tzinfo=tzinfo)

self.assertEqual(result_dt.timestamp(), result_pd.timestamp())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've recently switched to pytest for our test-runner, so you can write these as

assert result_dt.timesamp() == result_pd.timestamp()
...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice. :)

self.assertEqual(result_dt, result_pd)
self.assertEqual(result_dt, result_pd.to_pydatetime())

result_dt = dt.replace(tzinfo=tzinfo).replace(tzinfo=None)
result_pd = Timestamp(dt).replace(tzinfo=tzinfo).replace(tzinfo=None)

self.assertEqual(result_dt.timestamp(), result_pd.timestamp())
self.assertEqual(result_dt, result_pd)
self.assertEqual(result_dt, result_pd.to_pydatetime())

def test_index_equals_with_tz(self):
left = date_range('1/1/2011', periods=100, freq='H', tz='utc')
right = date_range('1/1/2011', periods=100, freq='H', tz='US/Eastern')
Expand Down