-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DateOffset weekday around DST produces unexpected results. fixes #5175 #5292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
the failed build is python 2.6; click on the travis link, then the failed one, scroll down and you can see the test failures. timedelta doesn't have the total_seconds attribute in 2.6, so you can prob just change the test to compute it e.g. instead of t.total_seconds() use t.hour_3600+t.minute_60+t.seconds....which will work under all python versions |
@adamgreenhall can you rebase how's this coming? |
@adamgreenhall can you rebase this? |
@adamgreenhall can you rebase? |
@jreback - sorry this took so long to get around to - will try to get this done this week. Current status: rebased but many of the offsets tests are failing. Some of the tests unrelated to DST are failing because of a years kwd. Any ideas? Example:
|
ok - have my tests on DST passing - just the year offsets are failing now. |
@@ -105,12 +105,23 @@ def __add__(date): | |||
""" | |||
_cacheable = False | |||
_normalize_cache = True | |||
_kwds_use_relativedelta = ( | |||
'months', 'weeks', 'days', | |||
'month', 'week', 'day', 'weekday', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to add year/years here I think (they are keywords for relativedelta not timedelta)
but then I am confused why you are doing this at all? is relativedelta better at DST calcs? the problem is it is MUCH slower
@jreback - tests are passing - all looks good. |
can you check performance on this?
also need a release note in Bug Fix section |
and get_utc_offset_hours(t) == -7) | ||
|
||
|
||
class TestDST(tm.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I think you need to put ALL of these (TestDST) tests into the TestBase
object so it will run them for each frequency (you may want to exclude certain of the estoeric ones and/or ones which it does affect); alternatively you can make TestDST
inherit from object
and then just include that class in the offsets that matter.
The reason I think you need to do this is you are adding logic to apply
, but a lot of the offsets override apply
and might not call your code. (so its possible we need a common method that ALL offsets call or something).
Will work on the tests tomorrow. Looks like a few of the performance metrics are taking longer. I imagine that
|
yeah relativedelta is slow (that's why its not used unless you readlly need it). I would refactor this to make the setup method callable, and then only call it when you really need to. |
Is it relativedelta that is slow? that is what is being used on master for all computations. Or is it just the additional timezone conversion? |
iirc relativedelta is only used for month/year offsets not day or less |
It was actually the timezone conversion that was taking a long time. I refactored a bit and now have things running faster (I think). But one of the custom date offset tests is now failing:
Any ideas? I'm not sure exactly what is going on in this test. Also, the Run 1 (>1.2x):
Run 2 (>1.2x):
|
rebase on master, that test is fixed |
thanks - rebased and all tests passing. |
@@ -105,25 +105,49 @@ def __add__(date): | |||
""" | |||
_cacheable = False | |||
_normalize_cache = True | |||
_kwds_use_relativedelta = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that you would use relativedelta for everything (except nanoseconds)? is that right? how does this solve the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
relativedelta
is used for all singluar offsets and plural offsets greater than daily frequency. With this change these calculations are done in UTC.
Also with this change, timedelta
is now used for all plural offsets (hours, minutes, seconds) under daily frequency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so why are minutes,hours,seconds,microseconds
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mintue, hour, second, ... -- singular not plural. Because you can do:
In [7]: pd.Timestamp("2013-11-02 00:45") + pd.DateOffset(minute=30)
Out[7]: Timestamp('2013-11-02 00:30:00', tz=None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why wouldn't you do ALL calculations then using relativedelta? why is timedelta not good?
shouldn't you ALWAYS do them in UTC (e.g. convert to/from utc if their is a timezone, then back)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you just tranlsate the plurals then, isn't it simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plurals and singular are not the same -- plurals behave how you would expect, but singulars are used for fixing the time part:
In [2]: t = pd.Timestamp('2014-06-10 8am')
In [3]: t + pd.DateOffset(hour=9)
Out[3]: Timestamp('2014-06-10 09:00:00', tz=None)
In [4]: t + pd.DateOffset(hours=9)
Out[4]: Timestamp('2014-06-10 17:00:00', tz=None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a bug to me, eg. t + pd.offsets.Hour(9)
== t + pd.DateOffset(hours=9) == t + timedelta(hours=9)
(and the first is prob how most users do this). Are their tests for the former anywhere? can you point them out.
Is this documented anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find any tests or documentation for the singular form anywhere in pandas. However, I think this behavior is inherited from dateutil.relativedelta
, which mentions it in passing in the documentation:
year, month, day, hour, minute, second, microsecond
Absolute information.
years, months, weeks, days, hours, minutes, seconds, microseconds
Relative information, may be negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u put that link on the linked issue #7418
so again, why are you not using hours/minutes/seconds etc using |
you need to move the tests as I indicated above, so that all offsets are tested. The way you have it only a very limited number are. |
hours/minutes/seconds/... now use
I'll fix those tests. |
return (o.days * 24 * 3600 + o.seconds) / 3600.0 | ||
|
||
|
||
class TestDST(tm.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback - want to have a look at these test cases? Went with a limited set of all of the possible DateOffset
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a sub-class of object
, then use it to inherit in offsets that should test this
e.g.
class TestBusinessDay(TestDST, Base)
@adamgreenhall I know we kept pushing this....sorry about that... can you rebase and can revist (I think let's just put it in regardless of #7418) |
|
||
if tzinfo is not None and self._use_relativedelta: | ||
# bring tz back from UTC calculation | ||
other = tzinfo.localize(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamgreenhall can you rebase? |
@jreback - rebased. |
thanks. can you add a release note in v0.15.0.txt. this is a bug fix? and pls squash |
|
||
if tzinfo is not None and self._use_relativedelta: | ||
# bring tz back from UTC calculation | ||
other = tzinfo.localize(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this isn't required as apply_wraps does. Otherwise, use tslib._localize_pydatetime to support dateutil, and better to have a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you do need the tslib._localize_pydatetime
. Tested in test_offsets.test_fallback_singular
, which fails without it.
fixed some py2.6 and py3 compatibility issues, rerunning tests. |
offset = DateOffset(**{offset_name: offset_n}) | ||
t = tstart + offset | ||
if expected_utc_offset is not None: | ||
assert(get_utc_offset_hours(t) == expected_utc_offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use any bare asserts, instead use self.assertTrue
(or False
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
2b36e9d
to
2a3bdce
Compare
|
||
if len(kwds_no_nanos) > 0: | ||
if any(k in self._kwds_use_relativedelta for k in kwds_no_nanos): | ||
self._use_relativedelta = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using this variable, why don't you make this a cached_readonly property that tests isinstance(self._offset,relativedelta)
, much more intuitive I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or you can use that kw, but return it with the offset. What you are doing here is really odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how the cached_readonly property would work - that's not a pattern I'm familiar with (is there an example handy?). I'll return the return _use_relativedelta
.
2a3bdce
to
2a42334
Compare
any updates on this? need anything from me? |
@adamgreenhall merged! this has been around a while! thanks! |
😸 |
closes #5175
Wound up working on this in preparation for the DST change next week. Here's a first pass at a solution and a test.