Skip to content

PERF: Fix performance regression in read_csv when converting datetimes #52057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 29, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions pandas/io/parsers/base_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,10 @@
)
from pandas.core.dtypes.missing import isna

from pandas import StringDtype
from pandas import (
DatetimeIndex,
StringDtype,
)
from pandas.core import algorithms
from pandas.core.arrays import (
ArrowExtensionArray,
Expand Down Expand Up @@ -1116,14 +1119,19 @@ def converter(*date_cols, col: Hashable):
date_format.get(col) if isinstance(date_format, dict) else date_format
)

return tools.to_datetime(
result = tools.to_datetime(
ensure_object(strs),
format=date_fmt,
utc=False,
dayfirst=dayfirst,
errors="ignore",
cache=cache_dates,
)._values
)
if isinstance(result, DatetimeIndex):
arr = result.to_numpy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know it is timezone-naive here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no tz-aware.

I guess you are referring to result._values._ndarray? Tried this first, but breaks tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its its tzaware then to_numpy() should convert to object, which i havent checked but assume we dont want here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it does, but I am not too concerned by this since this is the same behavior as before. This keeps performance at least stable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to clarify my initial response: we can either be tz aware or naive, depends on the input

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel ok with merging? We should get this into 2.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK with me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

arr.flags.writeable = True
return arr
return result._values
else:
try:
result = tools.to_datetime(
Expand Down