Skip to content

BUG: Timestamp.replace chaining not compat with datetime.replace #15683 #16110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,7 @@ Conversion
- Fixed the return type of ``IntervalIndex.is_non_overlapping_monotonic`` to be a Python ``bool`` for consistency with similar attributes/methods. Previously returned a ``numpy.bool_``. (:issue:`17237`)
- Bug in ``IntervalIndex.is_non_overlapping_monotonic`` when intervals are closed on both sides and overlap at a point (:issue:`16560`)
- Bug in :func:`Series.fillna` returns frame when ``inplace=True`` and ``value`` is dict (:issue:`16156`)
- Bug in ``Timestamp.replace`` when replacing ``tzinfo`` around DST changes (:issue:`15683`)

Indexing
^^^^^^^^
Expand Down
22 changes: 11 additions & 11 deletions pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -703,14 +703,16 @@ class Timestamp(_Timestamp):

cdef:
pandas_datetimestruct dts
int64_t value
object _tzinfo, result, k, v
int64_t value, value_tz, offset
object _tzinfo, result, k, v, ts_input

# set to naive if needed
_tzinfo = self.tzinfo
value = self.value
if _tzinfo is not None:
value = tz_convert_single(value, 'UTC', _tzinfo)
value_tz = tz_convert_single(value, _tzinfo, 'UTC')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't make any sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the definition of tz_convert_single to cpdef int64_t and here make value_tz and offset int64_t definitions

offset = value - value_tz
value += offset

# setup components
pandas_datetime_to_datetimestruct(value, PANDAS_FR_ns, &dts)
Expand Down Expand Up @@ -744,16 +746,14 @@ class Timestamp(_Timestamp):
_tzinfo = tzinfo

# reconstruct & check bounds
value = pandas_datetimestruct_to_datetime(PANDAS_FR_ns, &dts)
ts_input = datetime(dts.year, dts.month, dts.day, dts.hour, dts.min,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly are you doing here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you changed the tzinfo from old_tz to new_tz in the old implementation, the code would first do a conversion "from UTC to old_tz" and later "from new_tz to UTC" in order to calculate a correct offset for the value.

However, this was the reason for the problems I discussed in the issue (e.g., off-by-1h-offsets were calculated and invalid errors "date does not exist in timezone" were raised when converting "from new_tz to UTC"). These issues only occured around DST changes when the new offset resulted in a DST switch, so they were not covered before.

What I now did was simply constructing a new Timestamp instance and let the Pandas internals do the correct calculations for value. Unfortunately, the low-level (C-)stuff is not very well documented so my solution for creating a correct result Timestamp might not be the most optimal one.

However, correct code is better than optimal code, so I hope this PR is good enough for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls run the asv suite and see if this regresses anything. perf is a big deal and this goes thru a few more conversions; this is why I am harping on the impl.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a fast low-level function that takes a year, month, day, hour, minute, second, nano and a tzinfo and calculates value so that I can avoid creating a datetime object? Something like convert_to_tsobject() but with a different signature? Or can I call this function in a different way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can trace what these are doing and just replicate it. there is some overhead to creating it in this way.

dts.sec, dts.us, tzinfo=_tzinfo)
ts = convert_to_tsobject(ts_input, _tzinfo, None, 0, 0)
value = ts.value + (dts.ps // 1000)
if value != NPY_NAT:
_check_dts_bounds(&dts)

# set tz if needed
if _tzinfo is not None:
value = tz_convert_single(value, _tzinfo, 'UTC')

result = create_timestamp_from_ts(value, dts, _tzinfo, self.freq)
return result
return create_timestamp_from_ts(value, dts, _tzinfo, self.freq)

def isoformat(self, sep='T'):
base = super(_Timestamp, self).isoformat(sep=sep)
Expand Down Expand Up @@ -4211,7 +4211,7 @@ def tz_convert(ndarray[int64_t] vals, object tz1, object tz2):
return result


def tz_convert_single(int64_t val, object tz1, object tz2):
cpdef int64_t tz_convert_single(int64_t val, object tz1, object tz2):
"""
Convert the val (in i8) from timezone1 to timezone2

Expand Down
21 changes: 21 additions & 0 deletions pandas/tests/tseries/test_timezones.py
Original file line number Diff line number Diff line change
Expand Up @@ -1269,6 +1269,27 @@ def test_ambiguous_compat(self):
assert (result_pytz.to_pydatetime().tzname() ==
result_dateutil.to_pydatetime().tzname())

def test_replace_tzinfo(self):
# GH 15683
dt = datetime(2016, 3, 27, 1)
tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo

result_dt = dt.replace(tzinfo=tzinfo)
result_pd = Timestamp(dt).replace(tzinfo=tzinfo)

if hasattr(result_dt, 'timestamp'): # New method in Py 3.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this? 'New method in Py 3.3'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datetime.datetime.timestamp() method is new in 3.3.
I developed the Patch on 3.6 and then the GitHub CI failed because of this method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and why are you doing this? what does it do

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just doing various tests to make sure, pandas timestamps behave like datetimes, which they did not always do before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what exactly does this do? pls show an example of calling timestamp on a datetime.datetime.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just checking if both, pandas and datetime dates, are equal in direct comparision and if converted to timestamps. The later check only works on Python 3, so I put a check around it. I could have writte a "timestamp" function for Python 2, but that would have been just too much untrivial work for very little benefit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question would be if that timestamp check is needed?
If the timestamp values (which gives the POSIX time) differ, wouldn't the actual datetime.datetime objects also not evaluate to not equal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the issue, I had some cases where the datetimes differed but the timestamps were equal. Since the timestamps were always equal even if the datetimes differed, this check might not be necessary but I wanted to do the extra check just to be sure.

The issue that this patch closes was not detected by the existing (very extensive) test suite, so I thought more checks won't hurt. :)

assert result_dt.timestamp() == result_pd.timestamp()
assert result_dt == result_pd
assert result_dt == result_pd.to_pydatetime()

result_dt = dt.replace(tzinfo=tzinfo).replace(tzinfo=None)
result_pd = Timestamp(dt).replace(tzinfo=tzinfo).replace(tzinfo=None)

if hasattr(result_dt, 'timestamp'): # New method in Py 3.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the correct 3rd comparison is Timestamp(result_dt) == Timestamp(result_pd), in fact to make this more clear I would do:

for left, right in [(Timestamp, Timestamp), (lambda x: x, lambda x: x), (lambda x: x, lamba x: x.to_pydatetime())]:
     assert left(result_dt) == right(result_pd)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho, this is a lot harder to read and understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, you need to make one of the changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the loop is in this case harder to read

BTW, @jreback Timestamp(dt) and dt.timestamp() is not equivalent. dt.timestamp() is more like Timestamp.value (although I am not sure how dt.timestamp() handles timezones)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use dt.timestamp() anywhere, so a test of this is irrelevant.

Timestamp(result_dt) == Timestamp(result_pd) is the actual test that needs to pass.

assert result_dt.timestamp() == result_pd.timestamp()
assert result_dt == result_pd
assert result_dt == result_pd.to_pydatetime()

def test_index_equals_with_tz(self):
left = date_range('1/1/2011', periods=100, freq='H', tz='utc')
right = date_range('1/1/2011', periods=100, freq='H', tz='US/Eastern')
Expand Down