-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Timestamp.replace chaining not compat with datetime.replace #15683 #16110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
fcb17b0
b7c2e86
5e9bfd2
93829c5
038c2d3
5b912c2
ee30db8
dd0e33f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -703,14 +703,16 @@ class Timestamp(_Timestamp): | |
|
||
cdef: | ||
pandas_datetimestruct dts | ||
int64_t value | ||
object _tzinfo, result, k, v | ||
int64_t value, value_tz, offset | ||
object _tzinfo, result, k, v, ts_input | ||
|
||
# set to naive if needed | ||
_tzinfo = self.tzinfo | ||
value = self.value | ||
if _tzinfo is not None: | ||
value = tz_convert_single(value, 'UTC', _tzinfo) | ||
value_tz = tz_convert_single(value, _tzinfo, 'UTC') | ||
offset = value - value_tz | ||
value += offset | ||
|
||
# setup components | ||
pandas_datetime_to_datetimestruct(value, PANDAS_FR_ns, &dts) | ||
|
@@ -744,16 +746,14 @@ class Timestamp(_Timestamp): | |
_tzinfo = tzinfo | ||
|
||
# reconstruct & check bounds | ||
value = pandas_datetimestruct_to_datetime(PANDAS_FR_ns, &dts) | ||
ts_input = datetime(dts.year, dts.month, dts.day, dts.hour, dts.min, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what exactly are you doing here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When you changed the tzinfo from However, this was the reason for the problems I discussed in the issue (e.g., off-by-1h-offsets were calculated and invalid errors "date does not exist in timezone" were raised when converting "from new_tz to UTC"). These issues only occured around DST changes when the new offset resulted in a DST switch, so they were not covered before. What I now did was simply constructing a new Timestamp instance and let the Pandas internals do the correct calculations for value. Unfortunately, the low-level (C-)stuff is not very well documented so my solution for creating a correct result Timestamp might not be the most optimal one. However, correct code is better than optimal code, so I hope this PR is good enough for now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls run the asv suite and see if this regresses anything. perf is a big deal and this goes thru a few more conversions; this is why I am harping on the impl. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a fast low-level function that takes a year, month, day, hour, minute, second, nano and a tzinfo and calculates value so that I can avoid creating a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can trace what these are doing and just replicate it. there is some overhead to creating it in this way. |
||
dts.sec, dts.us, tzinfo=_tzinfo) | ||
ts = convert_to_tsobject(ts_input, _tzinfo, None, 0, 0) | ||
value = ts.value + (dts.ps // 1000) | ||
if value != NPY_NAT: | ||
_check_dts_bounds(&dts) | ||
|
||
# set tz if needed | ||
if _tzinfo is not None: | ||
value = tz_convert_single(value, _tzinfo, 'UTC') | ||
|
||
result = create_timestamp_from_ts(value, dts, _tzinfo, self.freq) | ||
return result | ||
return create_timestamp_from_ts(value, dts, _tzinfo, self.freq) | ||
|
||
def isoformat(self, sep='T'): | ||
base = super(_Timestamp, self).isoformat(sep=sep) | ||
|
@@ -4211,7 +4211,7 @@ def tz_convert(ndarray[int64_t] vals, object tz1, object tz2): | |
return result | ||
|
||
|
||
def tz_convert_single(int64_t val, object tz1, object tz2): | ||
cpdef int64_t tz_convert_single(int64_t val, object tz1, object tz2): | ||
""" | ||
Convert the val (in i8) from timezone1 to timezone2 | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1269,6 +1269,27 @@ def test_ambiguous_compat(self): | |
assert (result_pytz.to_pydatetime().tzname() == | ||
result_dateutil.to_pydatetime().tzname()) | ||
|
||
def test_replace_tzinfo(self): | ||
# GH 15683 | ||
dt = datetime(2016, 3, 27, 1) | ||
tzinfo = pytz.timezone('CET').localize(dt, is_dst=False).tzinfo | ||
|
||
result_dt = dt.replace(tzinfo=tzinfo) | ||
result_pd = Timestamp(dt).replace(tzinfo=tzinfo) | ||
|
||
if hasattr(result_dt, 'timestamp'): # New method in Py 3.3 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is this? 'New method in Py 3.3'? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and why are you doing this? what does it do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am just doing various tests to make sure, pandas timestamps behave like datetimes, which they did not always do before. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and what exactly does this do? pls show an example of calling timestamp on a datetime.datetime. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am just checking if both, pandas and datetime dates, are equal in direct comparision and if converted to timestamps. The later check only works on Python 3, so I put a check around it. I could have writte a "timestamp" function for Python 2, but that would have been just too much untrivial work for very little benefit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My question would be if that timestamp check is needed?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the issue, I had some cases where the datetimes differed but the timestamps were equal. Since the timestamps were always equal even if the datetimes differed, this check might not be necessary but I wanted to do the extra check just to be sure. The issue that this patch closes was not detected by the existing (very extensive) test suite, so I thought more checks won't hurt. :) |
||
assert result_dt.timestamp() == result_pd.timestamp() | ||
assert result_dt == result_pd | ||
assert result_dt == result_pd.to_pydatetime() | ||
|
||
result_dt = dt.replace(tzinfo=tzinfo).replace(tzinfo=None) | ||
result_pd = Timestamp(dt).replace(tzinfo=tzinfo).replace(tzinfo=None) | ||
|
||
if hasattr(result_dt, 'timestamp'): # New method in Py 3.3 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the correct 3rd comparison is
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Imho, this is a lot harder to read and understand. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well, you need to make one of the changes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree the loop is in this case harder to read BTW, @jreback There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we don't use
|
||
assert result_dt.timestamp() == result_pd.timestamp() | ||
assert result_dt == result_pd | ||
assert result_dt == result_pd.to_pydatetime() | ||
|
||
def test_index_equals_with_tz(self): | ||
left = date_range('1/1/2011', periods=100, freq='H', tz='utc') | ||
right = date_range('1/1/2011', periods=100, freq='H', tz='US/Eastern') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't make any sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the definition of
tz_convert_single
tocpdef int64_t
and here makevalue_tz
andoffset
int64_t
definitions