-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: to_datetime handle non-object cases outside cython #50263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just got some comments to check my understanding
mult, _ = precision_from_unit(unit) | ||
|
||
iresult = arg.astype("i8") | ||
mask = np.isnan(arg) | (arg == iNaT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we confident that np.isnan(arg)
addresses the issue for which values != values
needed to be added? if so, just for my understanding, why could it not have been used in tslib.pyx
? I'm guessing it was for the performance penalty of calling a Python function within Cython?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we confident that np.isnan(arg) addresses the issue for which values != values needed to be added?
im pretty confident, yes
why could it not have been used in tslib.pyx? I'm guessing it was for the performance penalty of calling a Python function within Cython?
it should work just fine in cython too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be
mask = np.isnan(arg) | (arg == iNaT) | |
mask = np.isnan(arg) | (iresult == iNaT) |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only test which seems to require that is pandas/tests/tools/test_to_datetime.py::TestToDatetimeUnit::test_to_datetime_unit_fractional_seconds
, and it doesn't seem to make a difference there so maybe it's OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt make a difference
mult, _ = precision_from_unit(unit) | ||
|
||
iresult = arg.astype("i8") | ||
mask = np.isnan(arg) | (arg == iNaT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be
mask = np.isnan(arg) | (arg == iNaT) | |
mask = np.isnan(arg) | (iresult == iNaT) |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me
Just left a comment about comparing arg
(as opposed to iresult
) to iNaT
, although it doesn't seem to make a difference to the 1 test for which that comparison is needed
Thanks @jbrockmendel |
@jbrockmendel this looks like a likely culprit for https://asv-runner.github.io/asv-collection/pandas/#inference.ToDatetimeFromIntsFloats.time_sec_uint64 The regression occurred between d3e5e05 and 5f3c29e, so not totally sure that this is the problem, but the asv hits the code paths introduced here |
More or less all the time is spend in astype_overflowsafe |
Yah I noticed that too. Reverting wouldn't be the worst thing, thought might be worth checking that the old version didn't allow silent overflows |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.