-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/WIP: segfault manifesting with dateutil=2.6 w.r.t. replace when timezones are present #14631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not sure exactly how or if you have implemented ambiguous time support in The plus side is that as long as all arithmetic is done in a fixed-offset zone like UTC, I will see if I can find where you are implementing dateutil support, but you may be faster at that side of it. |
Oh, actually, I didn't realize you were pinning dateutil to 2.5.3 for some reason. I don't think that's a good idea. It's almost certainly better to just fix |
@pganssle we r pinning various versions of dateutil for testing |
I see, I didn't realize that. Looking into this code a bit, I'm not really surprised that it's fragile with respect to dateutil changes. Given the lack of previous ambiguous time support it's understandable, but there's a lot of hacks in here related to I think I can make things much easier by introducing two new features in the next release - an official API for retrieving transitions from |
@pganssle Can you move that last comment to a new issue to discuss this? As I don't think it is related to the actual code changes in this PR? (for reviewing this PR) |
i changed the pin on the osx build arbitrarily (could have been 3.4) |
@pganssle certainly welcome improved apis! we can easily just use them if available (based on version or duck typing) @pganssle is there is a list of releases and dates when they were released? csn u show what changed for ambiguous time support in 2.5.3 -> 2.6 |
@jorisvandenbossche I can move it, yes, I'll make something a bit more formal later. I was just commenting on this because the use of undocumented features is likely a good place to look for possible changes. I think in this case, though, the issue is that PEP-495 actually changes the default fallback for unaware datetimes: See: from dateutil import tz
from datetime import datetime, timedelta
LON = tz.gettz('Europe/London')
dt1 = datetime(2013, 10, 26, 23, tzinfo=LON)
dt2 = datetime(2013, 10, 27, 1, tzinfo=LON)
dt3 = tz.enfold(dt2, fold=1)
print(dt1) # 2013-10-26 23:00:00+01:00
print(dt2) # 2013-10-27 01:00:00+01:00
print(dt3) # 2013-10-27 01:00:00+00:00 I was expecting that no one was relying on this behavior because there was originally no actual way to construct If you have the datetime as UTC already, it's very easy, just use Evidently, I forgot to add these functions to |
2.6.0
So now I am all confused, I think that 2013/11/03 02:02:59.... is an invalid time, so I think that [44] and [12] are right. I think that we are testing the wrong behavior here. any thoughts welcome (the 2.5.3 and 2.6.0) are in different ipython sessions, so output numbers might be confusing. |
[44] doesn't seem right to me as the time switched over so it should be -0800 rather than -0700. And also [41] has the right offset but it should be 01:02. Similar with [12] that should go to 02:02 since it has already switched back as it is -0800 like [9] does. Also [10]/[11] above is wrong, that should be -0700 since you are specifying that it is DST. Also, 2013/11/03 02:02:59 is not an invalid time since it is outside the DST change window which is from [1:00, 2:00). |
[44] should look like this
|
@rockg @jreback My understanding is that >>> from dateutil import tz
>>> from datetime import datetime, timedelta
>>> PT = tz.gettz('US/Pacific')
>>> dt = datetime(2013, 11, 2, 12, tzinfo=PT)
>>> dt + timedelta(days=1)
datetime.datetime(2013, 11, 3, 12, 0, tzinfo=tzfile('/usr/share/zoneinfo/US/Pacific')) If you think of If you want to override the default python semantics and have >>> (dt.astimezone(tz.tzutc()) + timedelta(days=1)).astimezone(PT)
datetime.datetime(2013, 11, 3, 11, 0, tzinfo=tzfile('/usr/share/zoneinfo/US/Pacific'))
>>> dt2 = datetime(2013, 11, 3, 0, 30, tzinfo=PT)
>>> str((dt2.astimezone(tz.tzutc()) + timedelta(hours=1)).astimezone(PT))
'2013-11-03 01:30:00-07:00'
>>> str((dt2.astimezone(tz.tzutc()) + timedelta(hours=2)).astimezone(PT))
'2013-11-03 01:30:00-08:00' I don't really see why |
@pganssle pandas is just trying NOT to change behavior. if we want something to change that's another issue. this PR is just trying to reconcile the differences in dateutil 2.5.3 and 2.6 and to have it work on either version correctly. |
@rockg 10/11 is from existing master and 2.5.3 so I think that we are then testing with an incorrect expectation? |
@jreback If you want the old default behavior from dateutil < 2.6.0, then you should do this: from dateutil import tz
datetime_ambiguous = getattr(tz, 'datetime_ambiguous', lambda dt, tz=None: return False)
enfold = getattr(tz, 'enfold', lambda x, fold=1: return x)
def add_timedelta(dt, td):
dt_out = dt + td
if datetime_ambiguous(dt_out):
return enfold(dt_out, fold=1)
else:
return dt_out This does not handle As for [10]/[11], I'm not sure what your implementation is, but if you're using |
Yes, the test is wrong. This is how it looks with pytz and is correct. I don't believe we are calling any specific dateutil methods but merely getting transition times, etc. from the timezone definition.
|
@rockg I could be wrong, but you must be using dateutil methods somewhere, otherwise the behavior wouldn't have changed between versions 2.5.3 and 2.6.0, right? Is it possible that you use the transitions taken from the This is fixed in |
so .replace is called in the repr this may be affecting things |
Looks like the underlying representation is correct, but when you try and use >>> dateutil.__version__
'2.5.3'
>>> t_std = pd.Timestamp('2013-11-03 01:59:59.9999').tz_localize('dateutil/US/Pacific',ambiguous=0)
>>> t_std
Timestamp('2013-11-03 01:59:59.999900-0800', tz='dateutil//usr/share/zoneinfo/US/Pacific')
>>> t_dst = pd.Timestamp('2013-11-03 01:59:59.9999').tz_localize('dateutil/US/Pacific', ambiguous=1)
>>> t_dst
Timestamp('2013-11-03 01:59:59.999900-0800', tz='dateutil//usr/share/zoneinfo/US/Pacific')
>>> t_std.astimezone('UTC')
Timestamp('2013-11-03 09:59:59.999900+0000', tz='UTC')
>>> t_dst.astimezone('UTC')
Timestamp('2013-11-03 08:59:59.999900+0000', tz='UTC')
>>> t_std.dst()
datetime.timedelta(0)
>>> t_dst.dst()
datetime.timedelta(0) You may be able to jump through hoops to fix this, but I don't think it would be out of bounds to just say it's an upstream bug in dateutil and recommend upgrading to the latest version. |
1dc5574
to
970e3f0
Compare
Ok I have a consistent test for the difference now, working for 2.5.3 with this PR. (these compare pytz and dateutil) I think there exists a bug in [5]/[6](the UTC times are same, just the ambiguous tz looks wrong for dateutil). ok. commit is cee7edc
So on 2.6 looks like the above case is fixed, but the non-ambiguous case is now broken. @pganssle is this on your side? (it could also be on the localization code on pandas side)
|
@jreback Yes, it seems there is a bug there, probably in the If you want to force it to work on 2.6.0, you'll have to determine whether it belongs before or after the fold and use |
d7b7586
to
ed6a7e0
Compare
ok this is updated skipping appropriate sections when 2.6.0 is there (so when you release next version, we will retest). @jorisvandenbossche because of the segfault, what do you think about 0.19.2 bugfix release? |
""" validate integers """ | ||
if not isinstance(v, int): | ||
raise ValueError("value must be an integer, received {v} for {k}".format(v=type(v), k=k)) | ||
return int(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be being silly here, but do you still need the int(v)
after you've done the isinstance
check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i think cython needs to force these to the correct int32_t (though will try to take it out)
value = tz_convert_single(value, tzinfo, 'UTC') | ||
|
||
result = create_timestamp_from_ts(value, dts, tzinfo, self.freq) | ||
#print("replace: {} -> {} [{}]".format(_get_zone(self.tzinfo), _get_zone(result.tzinfo), kwds)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be cleaned up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep - turns out repr depends on replace ; so printing is tricky actually
Current coverage is 85.28% (diff: 100%)
|
Error on appveyor that still seems related:
|
97e9c68
to
50f5601
Compare
…ones are present closes pandas-dev#14621
going to merge this tomorrow unless further comments the segfaults on tests are a problem with other prs |
…ones are present closes pandas-dev#14621 Author: Jeff Reback <[email protected]> Closes pandas-dev#14631 from jreback/replace and squashes the following commits: 3f95042 [Jeff Reback] BUG: segfault manifesting with dateutil=2.6 w.r.t. replace when timezones are present (cherry picked from commit f8bd08e)
closes #14621
This is very close, but a stubborn fall transition issue. any ideas? (note that this does not solve #7825 but should be a simple change after I think)
cc @ischwabacher
cc @rockg
cc @sinhrks
cc @adamgreenhall
cc @pganssle