-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix localize_pydatetime using meta datetimes as Timestamp (#25734) #25746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix localize_pydatetime using meta datetimes as Timestamp (#25734) #25746
Conversation
Codecov Report
@@ Coverage Diff @@
## master #25746 +/- ##
==========================================
- Coverage 91.25% 91.25% -0.01%
==========================================
Files 172 172
Lines 52977 52977
==========================================
- Hits 48343 48342 -1
- Misses 4634 4635 +1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25746 +/- ##
=======================================
Coverage 91.25% 91.25%
=======================================
Files 172 172
Lines 52977 52977
=======================================
Hits 48343 48343
Misses 4634 4634
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on the fix
This looks to be a regression in 0.24 (from this PR #21691) and eligible for backporting. |
Will need to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls show if this causes any perf issues
return isinstance(obj, datetime) | ||
|
||
class FakeDatetime(MetaDatetime("NewBase", (datetime,), {})): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A metaclass is needed in this test case to be able to override __instancecheck__
|
||
with monkeypatch.context() as m: | ||
# monkeypatch datetime everywhere | ||
for mod_name, module in list(sys.modules.items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least some level of monkeypatching is necessary for the original issue to show up.
However, this entire for
block could be replaced with m.setattr("pandas.tseries.offsets.datetime", FakeDatetime)
, but I wanted to future proof this test better and replicate what freezegun
and other libraries that might monkeypatch datetime
would do.
Hello @ArtificialQualia! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-03-18 02:37:58 UTC |
I ran the perf testing and did have some significant deviations with many of them in places I wouldn't expect to see any differences. I'll run them again tonight and see if the deviations are consistent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final comment clarification note from me otherwise LGTM.
Okay to backport to the 0.24 branch @jreback? This was a regression in 0.24.
pass | ||
|
||
with monkeypatch.context() as m: | ||
# monkeypatch datetime everywhere |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand this comment to describe what freezegun
(and other libraries) are doing?
no on the backport and this need perf checking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not clear what problem you are trying to solve here. can you elaborate.
@@ -136,7 +136,7 @@ Categorical | |||
Datetimelike | |||
^^^^^^^^^^^^ | |||
|
|||
- | |||
- Bug with :class:`Timestamp` and :class:`CustomBusinessDay` arithmetic throwing an exception with datetime subclasses (:issue:`25734`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't throw an exception, rather we raise an exception. Can you even be more explicit here, what does this have to do with CBD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to scope this description to more what was in the original issue, but really it is due to freezegun
monkeypatching datetime
with a subclass of datetime
. A few of the offsets.py
classes turn things into datetime
in their apply
functions. That's why this affects CustomBusinessDay
and not, for instance, BusinessDay
. After a brief look, a few other classes might be affected by this bug, like Easter
and FY5253
.
However, in writing a custom datetime
class for testing, another bug was discovered in convert_datetime_to_tsobject
which prevents any subclassed datetime
from being converted into a Timestamp
. freezegun
was not affected by this by chance because it happened to implement the nanosecond
property in their custom datetime
class, but that fix has been included in this PR as mentioned in another comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so say this affects subclasses of datetime. note btw, we don't provide any compat with this, but as usual will take patches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that this affects datetime subclasses is already in the release notes, I'm not sure what you would like changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:class:
CustomBusinessDay
arithmetic is misleading
Yah this is a common issue with Because this is replacing a C function call with a python attribute lookup, I'd expect the performance to take a hit. The question is whether this is called enough times for that perf hit to be non-trivial. Also, there are a handful of places where we use |
%timeit Results For the performance tests that over both runs had a factor > 1.1 (did not test the factor < 0.9) here are the %timeit results:
Master:
Master:
Master:
Master:
Master:
Out of these, the only 'significant' result was |
For the %timeit checks, I was suggesting that you run just the affected code, e.g |
I'm happy to do that, but should that be done in a separate issue/PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so rather than using PyDateTime_CheckExact, if we switch this to PyDateTime_Check it will likely work, though IIRC we did this as a perf issue? @jbrockmendel do you remember
@@ -136,7 +136,7 @@ Categorical | |||
Datetimelike | |||
^^^^^^^^^^^^ | |||
|
|||
- | |||
- Bug with :class:`Timestamp` and :class:`CustomBusinessDay` arithmetic throwing an exception with datetime subclasses (:issue:`25734`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so say this affects subclasses of datetime. note btw, we don't provide any compat with this, but as usual will take patches.
I thought that might work, but since we are using |
@ArtificialQualia's comment about An ideal solution would be to implement a
|
@@ -136,7 +136,7 @@ Categorical | |||
Datetimelike | |||
^^^^^^^^^^^^ | |||
|
|||
- | |||
- Bug with :class:`Timestamp` and :class:`CustomBusinessDay` arithmetic throwing an exception with datetime subclasses (:issue:`25734`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:class:
CustomBusinessDay
arithmetic is misleading
obj.value += ts.nanosecond | ||
obj.dts.ps = ts.nanosecond * 1000 | ||
except AttributeError: | ||
# probably a subclass of datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this probably?
pinged the cython mailing list about fast isinstance checks: https://groups.google.com/forum/#!topic/cython-users/AQSGs-Sx9Pc It sounds like |
After doing some profiling, it looks like the try/except AttributeError check is way slower than basically every other alternative: Within cython, had a function run each of the following 10k times. Then ran that function in
So the best "do it right" ( There are a couple of extant places where we use this check. I'm going to make a PR to get rid of those. @ArtificialQualia are you OK with putting this on hold until a performant solution is in place? |
@jbrockmendel I tried to replicate your results based on your testing criteria, and got some similar results with some exceptions. For instance:
Either way, it seems to me as well the answer is to use I'm happy to put this on hold until your other PR is merged. |
When I tried this, I found the signature of the
My first attempt at this failed and I don't know when I'll be able to circle back to it. Better not to wait on me. |
My signature was I found a very large difference depending on how I'm happy to make the PR for the larger scope of changes, but if you were thinking of having someone else do it that is fine with me as well. |
#25853 implements this |
git diff upstream/master -u -- "*.py" | flake8 --diff
Proposed fix for #25734
The issue was due to localize_pydatetime assuming anything that was
not PyDateTime_CheckExact
must be a pandasTimestamp
. However, in the rare case adatetime
was created with a custom metaclass, thatdatetime
would failPyDateTime_CheckExact
, but obviously was not aTimestamp
.Therefore, I made the
not PyDateTime_CheckExact
check more explicitly look for aTimestamp
. As theTimestamp
class could not be imported for aisinstance
check due to a circular import, I took the route that many of the otheris_*
style checks took of adding a_typ
property toTimestamp
and inspecting that to determine identity.