Skip to content

API: to_datetime with integers/floats and format desired behavior? #55663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbrockmendel opened this issue Oct 24, 2023 · 2 comments
Open
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors Datetime Datetime data dtype

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Oct 24, 2023

>>> pd.to_datetime([20231024])
DatetimeIndex(['1970-01-01 00:00:00.020231024'], dtype='datetime64[ns]', freq=None)

>>> pd.to_datetime([20231024], format="%Y%m%d")
DatetimeIndex(['2023-10-24'], dtype='datetime64[ns]', freq=None)

I'm trying to iron out differences between several datetime-parsing paths. array_to_datetime and array_strptime (which to_datetime goes through when a format is specified) have different behavior for ints/floats. array_to_datetime will cast (non-nan) floats to str and then send those through the string parsing, so the int 20231024 gets treated like the string "20231024" as in the example above.

We have 77 tests that hit this path in array_strptime, 2 of them in test_sql, 25 of them in test_stata (though #55642 will get rid of those), the rest in test_to_datetime.

(Note also that array_to_datetime_with_unit has a now-deprecated behavior casting strings to floats!)

Some options

  1. Change nothing, the status quo is fine
  2. Tell users to explicitly cast their ints/floats to strings if thats what they want
  3. Move away from allowing floats/ints in either array_to_datetime or array_strptime; all-numeric cases get their own path (DatetimeIndex does this with a check for infer_dtype(data) == "integer") and push users to do something explicit with mixed-type cases before getting to to_datetime
@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 24, 2023
@mroeschke
Copy link
Member

2 sounds good to me, but for an input like

pd.to_datetime([20231024, "20231024"], format="%Y%m%d")

The int argument would be parsed as epoch + int<unit> in the future?

@jbrockmendel
Copy link
Member Author

The int argument would be parsed as epoch + int in the future?

Yes, though ATM i think passing unit causes us to go through a 3rd path that IIRC doesn't support the format keyword

@jbrockmendel jbrockmendel added Datetime Datetime data dtype Constructors Series/DataFrame/Index/pd.array Constructors API Design and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Constructors Series/DataFrame/Index/pd.array Constructors Datetime Datetime data dtype
Projects
None yet
Development

No branches or pull requests

2 participants