Skip to content

REF: to_datetime handle non-object cases outside cython #50263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 17, 2022

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@MarcoGorelli MarcoGorelli self-requested a review December 15, 2022 17:13
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just got some comments to check my understanding

mult, _ = precision_from_unit(unit)

iresult = arg.astype("i8")
mask = np.isnan(arg) | (arg == iNaT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we confident that np.isnan(arg) addresses the issue for which values != values needed to be added? if so, just for my understanding, why could it not have been used in tslib.pyx? I'm guessing it was for the performance penalty of calling a Python function within Cython?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we confident that np.isnan(arg) addresses the issue for which values != values needed to be added?

im pretty confident, yes

why could it not have been used in tslib.pyx? I'm guessing it was for the performance penalty of calling a Python function within Cython?

it should work just fine in cython too.

Copy link
Member

@MarcoGorelli MarcoGorelli Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be

Suggested change
mask = np.isnan(arg) | (arg == iNaT)
mask = np.isnan(arg) | (iresult == iNaT)

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only test which seems to require that is pandas/tests/tools/test_to_datetime.py::TestToDatetimeUnit::test_to_datetime_unit_fractional_seconds, and it doesn't seem to make a difference there so maybe it's OK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt make a difference

mult, _ = precision_from_unit(unit)

iresult = arg.astype("i8")
mask = np.isnan(arg) | (arg == iNaT)
Copy link
Member

@MarcoGorelli MarcoGorelli Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be

Suggested change
mask = np.isnan(arg) | (arg == iNaT)
mask = np.isnan(arg) | (iresult == iNaT)

?

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me

Just left a comment about comparing arg (as opposed to iresult) to iNaT, although it doesn't seem to make a difference to the 1 test for which that comparison is needed

@mroeschke mroeschke added the Refactor Internal refactoring of code label Dec 17, 2022
@mroeschke mroeschke added this to the 2.0 milestone Dec 17, 2022
@mroeschke mroeschke added the Datetime Datetime data dtype label Dec 17, 2022
@mroeschke mroeschke merged commit 2f54a47 into pandas-dev:main Dec 17, 2022
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the ref-to_datetime branch December 17, 2022 20:50
phofl pushed a commit to phofl/pandas that referenced this pull request Dec 17, 2022
@phofl
Copy link
Member

phofl commented Mar 4, 2023

@jbrockmendel this looks like a likely culprit for https://asv-runner.github.io/asv-collection/pandas/#inference.ToDatetimeFromIntsFloats.time_sec_uint64

The regression occurred between d3e5e05 and 5f3c29e, so not totally sure that this is the problem, but the asv hits the code paths introduced here

@phofl
Copy link
Member

phofl commented Mar 4, 2023

More or less all the time is spend in astype_overflowsafe

@jbrockmendel
Copy link
Member Author

Yah I noticed that too. Reverting wouldn't be the worst thing, thought might be worth checking that the old version didn't allow silent overflows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants