Skip to content

ENH: Support for non-nanosecond resolution in read_csv and to_datetime #54048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
aulemahal opened this issue Jul 7, 2023 · 4 comments · Fixed by #55901
Closed
2 of 3 tasks

ENH: Support for non-nanosecond resolution in read_csv and to_datetime #54048

aulemahal opened this issue Jul 7, 2023 · 4 comments · Fixed by #55901
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement IO CSV read_csv, to_csv Non-Nano datetime64/timedelta64 with non-nanosecond resolution

Comments

@aulemahal
Copy link

aulemahal commented Jul 7, 2023

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish I could use the newly added support for datetime64[ms] directly in pandas when opening a file.

Feature Description

import pandas as pd
import io

data = """\
index,date
A,"0004-04-04T12:30"
B,"2004-04-04T12:30"
C,"3004-04-04T12:30"
D,""
"""

df = pd.read_csv(io.StringIO(data), parse_dates=['date'])
df

which would return:

index                date
0     A    4-04-04 12:30:00
1     B 2004-04-04 12:30:00
2     C 3004-04-04 12:30:00
3     D                 NaT

Or alternatively, it could also be more explicit:

df = pd.read_csv(io.StringIO(data))
df['date'] = pd.to_datetime(df['date'], reso='ms')

Alternative Solutions

Currently, the best way I have found to read in a CSV that has entries outside the 1677-2242 range is:

df = pd.read_csv(io.StringIO(data))
df['date'] = df['date'].fillna("").to_numpy().astype('datetime64[ms]')

Thus, I am letting numpy to the actual date parsing. fillna is needed because the empty cell in data gets translated to np.nan by read_csv and numpy can't cast that as a datetime64. (I expected a NaT but I guess that's another issue anyway).

This solution requires that the dates be in ISO8601 format, which is much stricter than to_datetime.

Additional Context

See my S/O question for more alternative solutions: https://stackoverflow.com/questions/76608166/how-do-i-parse-a-list-of-datetimes-with-a-s-resolution-in-pandas-2
Thanks to ignoring-gravity for the answer, which I reused here.

@aulemahal aulemahal added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 7, 2023
@lithomas1
Copy link
Member

This seems reasonable to me.
cc @MarcoGorelli.

@lithomas1 lithomas1 added Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv Non-Nano datetime64/timedelta64 with non-nanosecond resolution and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 8, 2023
@MarcoGorelli
Copy link
Member

yup, totally. I think @jbrockmendel had something in progress for to_datetime, it's just involves taking care of some subtleties, and probably not something that can happen without a deprecation going in at the same time (so, unlikely to take effect before 3.0)

@jbrockmendel
Copy link
Member

I think the place to introduce unit inference is in array_to_datetime. I have a branch that started on that, but it has gone on the backburner for a while.

@erikalfthan
Copy link

Same is true for pd.read_sql(parse_dates=...). The workaround with to_numpy().astype() works though.

Should I open a separate bug or would it be enough to add tags to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement IO CSV read_csv, to_csv Non-Nano datetime64/timedelta64 with non-nanosecond resolution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants