-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH? to_datetime: ability to infer unit for numeric values #17860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@cailiang9 Can you be a bit more explanation what this issue is about? |
the pd.to_datetime(infer=True) does not work smartly. e.g.: it can not automatically infer the unit, so we have to try with unit='s' for 1442315569.315 or unit='ms' for 1442315569315 . |
I suppose you refer to the
And it deliberately does not handle inferring the unit for numeric values.
The above results are all valid. |
you can in general only do this w/o fail when the input overflows, so you can start with a lower precision unit (s) and proceed to a higher one (ns); we do something like this in read_json. But as a general rule would be -1. |
@jorisvandenbossche infer_datetime_format keyword is just an addition. Or we can print out the selected unit and a warning to tell a user to specify unit explicitly in the function . |
I know this is an old ticket, but since it's still open, I would like to add something here. I recently ran into a problem that the dataframe contains a column of epoch timestamp strings. While most of them are 13-digits integers (1536507914000, unit in ms), but there are a few cases that are 16-digits integers (1536507914123000, unit in us). In other words, with me explicitly passing unit as
The data I'm working with has many millions of records, so it's hard for me to spot a few hundred records with 16 digits epoch timestamps, until today I accidentally found the bug. So I agree with @cailiang9 that auto inferring unit of timestamps would be very helpful, especially for epoch timestamps. @jorisvandenbossche Could we add some sort of length check when parsing epoch timestamps, so that the |
xref #15836 looks very similar. |
Discussed in yesterday's dev call, there was consensus that we don't want to be guessing here. Closing. |
I implement a simple function for reference:
Expected Output
Output of
infer_as_datetime(dtarr)
DatetimeIndex(['1970-01-01', '1971-01-01', '1971-01-02'], dtype='datetime64[ns]', freq=None) DatetimeIndex(['1971-12-30 12:34:01.314000', '2015-01-30 22:00:00'], dtype='datetime64[ns]', freq=None) DatetimeIndex(['2015-09-15 11:12:49.315000'], dtype='datetime64[ns]', freq=None) DatetimeIndex(['2015-09-15 11:12:49'], dtype='datetime64[ns]', freq=None) DatetimeIndex(['2015-09-15 11:12:49'], dtype='datetime64[ns]', freq=None) DatetimeIndex(['2015-09-15 11:12:49'], dtype='datetime64[ns]', freq=None)
The text was updated successfully, but these errors were encountered: