Skip to content

[WIP] perf improvements for strftime #46116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 68 commits into from
Closed
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
1062e50
Added the default format "%Y-%m-%d %H:%M:%S" as basic format (same as…
Feb 22, 2022
8d07798
Created two utils `convert_dtformat` and `get_datetime_fmt_dct` so as…
Feb 24, 2022
4b2380e
Fixed two pep8
Feb 24, 2022
6a3708e
Added new `strftime` module in tslibs, containing `convert_dtformat` …
Feb 28, 2022
93f0dbe
Modified `DatetimeLikeArrayMixin._format_native_types` so that it acc…
Feb 28, 2022
5b9dd1b
Added a `fast_strftime` argument to all formatters: `Datetime64Format…
Feb 28, 2022
8328feb
Added a test in `TestDatetimeFastFormatter` so that it tests performa…
Feb 28, 2022
9bba33b
Fixed issue in tests happening when format=None
Mar 1, 2022
fe8aa3b
isort
Mar 1, 2022
a3cadec
Fixed `BusinessHour._repr_attrs` so that it is faster too.
Mar 1, 2022
90c836a
Added TODO for possible improvement in timedeltas
Mar 1, 2022
d394a1b
Added maintenance comments concerning using Timestamp.str
Mar 1, 2022
460504d
Improved/fix periods handling. New c function `_period_fast_strftime`.
Mar 1, 2022
d700c44
Fixed period string representation %FQ%q
Mar 1, 2022
1a44d02
Relaxed perf test for datetimes
Mar 1, 2022
61ec6d6
Removed 2 useless cdefs and fixed docstring of strftime with missing …
Mar 2, 2022
846a0fd
Fixed issue with BusinessHour formatting
Mar 2, 2022
84f4c7f
Fixed _period_fast_strftime FR_QTR frequency default format, and _per…
Mar 2, 2022
e96fe8d
Fixed the quarter issue introduced in previous commit. Apparently the…
Mar 2, 2022
707a084
Improved `convert_dtformat` so that it safely escapes remaining forma…
Mar 2, 2022
37f15c9
`convert_dtformat` now supports periods formatting and is robust to s…
Mar 2, 2022
e2e2d70
Fixed api test
Mar 2, 2022
087425f
Attempt to fix cython related uninitialized variable "quarter" warning
Mar 2, 2022
98d4cb4
New `Timestamp.fast_strftime`, leveraged by `format_array_from_dateti…
Mar 4, 2022
93c8f93
Fixed last failing test
Mar 4, 2022
7b74bd2
Fixed `test_missing_public_nat_methods`
Mar 4, 2022
c093e86
Blackened
Mar 4, 2022
2685aff
Fixed doctests
Mar 4, 2022
2c23f48
Fixed doctest example
Mar 7, 2022
1e17df6
Now supporting %y (short year), %I (12h clock hour) and %p (AM/PM). N…
Mar 7, 2022
1c06d5e
Flake8
Mar 7, 2022
c582753
Added `fast_strftime` argument to the csv-related tools (`DataFrame.t…
Mar 7, 2022
48c5e93
`SQLiteTable` now converts time objects faster thanks to string forma…
Mar 7, 2022
62add6e
Fixed `_format_datetime64_dateonly` so that it now relies on `Timesta…
Mar 7, 2022
d92a132
Added `AM_LOCAL` and `PM_LOCAL` constants fed at initial import time …
Mar 10, 2022
f797a6f
Accelerated ODS writer when dates need to be written: at least one of…
Mar 10, 2022
76c6a2e
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 10, 2022
3c9620f
Using `bool_t` instead of `bool` in type hints of core/generic.py
Mar 10, 2022
7b37812
isort
Mar 10, 2022
5c1854d
flake8 and black
Mar 10, 2022
17255b9
Docstring fix
Mar 10, 2022
12b9041
Fixed test namespace
Mar 10, 2022
743aad8
`AM_LOCAL` and `PM_LOCAL` are now `str` - this should fix the issue o…
Mar 10, 2022
cef0a69
Fixed test_format and removed the performance test, as it would fail …
Mar 10, 2022
ff50133
flake8 black and isort
Mar 10, 2022
98ae429
Added missing type hint declaration for the `fast_strftime` methods (…
Mar 10, 2022
64fa9de
Now catching the warnings about time zone being dropped in test durin…
Mar 10, 2022
c200ef1
Fixed locale-related issue: AM_LOCAL and PM_LOCAL were created at com…
Mar 10, 2022
cea3e07
Replaced `pytest.warns` with pandas `assert_produces_warning`
Mar 10, 2022
ca3a6cc
Fixed `CSVFormatter` (used in `to_csv`) so that `DatetimeIndex` or `P…
Mar 10, 2022
53cefd0
Fixed locale problem and improved test
Mar 10, 2022
0bdb9b2
isort
Mar 10, 2022
d981542
Reverted the mods as it appears it was not this problem
Mar 10, 2022
37a622c
Fixed failing test on non-en locales
Mar 10, 2022
ec87007
Revert "Reverted the mods as it appears it was not this problem"
Mar 10, 2022
98cbfca
New `LocaleSpecificDtStrings` objects are returned by `convert_strfti…
Mar 11, 2022
c80bbec
Removed useless argument in fast_strftime
Mar 11, 2022
b173dc6
Fixed #46319 by using `PyUnicode_DecodeLocale` to decode the string r…
Mar 11, 2022
b44c6cf
flake8 black and isort
Mar 11, 2022
4956ff1
black
Mar 11, 2022
399878a
Fixed `_format_datetime64_dateonly` used in `get_format_datetime64`. …
Mar 11, 2022
5b6e07e
Removed the multi-locale test as it fails on some ci targets
Mar 11, 2022
743106f
Fixed var init issue
Mar 11, 2022
44724af
Flake8
Mar 12, 2022
718a647
Improved `Datetime64TZFormatter` readability by replacing useless cal…
Mar 12, 2022
e6bb843
Flake8 fix !
Mar 12, 2022
272f0a2
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 13, 2022
12d279c
Merge branch 'main' of https://github.com/pandas-dev/pandas into feat…
Mar 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pandas/_libs/tslib.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ def format_array_from_datetime(
tz: tzinfo | None = ...,
format: str | None = ...,
na_rep: object = ...,
fast_strftime: bool = ...,
) -> npt.NDArray[np.object_]: ...
def array_with_unit_to_datetime(
values: np.ndarray,
Expand Down
62 changes: 60 additions & 2 deletions pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ from pandas._libs.tslibs.nattype cimport (
)
from pandas._libs.tslibs.timestamps cimport _Timestamp

from pandas._libs.tslibs.strftime import (
UnsupportedStrFmtDirective,
convert_strftime_format,
)
from pandas._libs.tslibs.timestamps import Timestamp

# Note: this is the only non-tslibs intra-pandas dependency here
Expand Down Expand Up @@ -101,7 +105,8 @@ def format_array_from_datetime(
ndarray[int64_t] values,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a tslibs.strftime does get implemented, that might be a good home for format_array_from_datetime

tzinfo tz=None,
str format=None,
object na_rep=None
object na_rep=None,
fast_strftime=True,
) -> np.ndarray:
"""
return a np object array of the string formatted values
Expand All @@ -114,19 +119,23 @@ def format_array_from_datetime(
a strftime capable string
na_rep : optional, default is None
a nat format
fast_strftime : bool, default True
If `True` (default) and the format permits it, a faster formatting
method will be used. See `convert_strftime_format`.

Returns
-------
np.ndarray[object]
"""
cdef:
int64_t val, ns, N = len(values)
int64_t val, ns, y, h, N = len(values)
ndarray[int64_t] consider_values
bint show_ms = False, show_us = False, show_ns = False
bint basic_format = False
ndarray[object] result = np.empty(N, dtype=object)
object ts, res
npy_datetimestruct dts
object str_format, loc_s

if na_rep is None:
na_rep = 'NaT'
Expand All @@ -146,6 +155,28 @@ def format_array_from_datetime(
consider_values //= 1000
show_ms = (consider_values % 1000).any()

elif format == "%Y-%m-%d %H:%M:%S":
# Same format as default, but with hardcoded precision (s)
basic_format = True
show_ns = show_us = show_ms = False

elif format == "%Y-%m-%d %H:%M:%S.%f":
# Same format as default, but with hardcoded precision (us)
basic_format = show_us = True
show_ns = show_ms = False

elif fast_strftime:
if format is None:
# We'll fallback to the Timestamp.str method
fast_strftime = False
else:
try:
# Try to get the string formatting template for this format
str_format, loc_s = convert_strftime_format(format)
except UnsupportedStrFmtDirective:
# Unsupported directive: fallback to standard `strftime`
fast_strftime = False

for i in range(N):
val = values[i]

Expand All @@ -167,10 +198,36 @@ def format_array_from_datetime(

result[i] = res

elif fast_strftime:

if tz is None:
dt64_to_dtstruct(val, &dts)

# Use string formatting for faster strftime
y = dts.year
h = dts.hour
result[i] = str_format % {
"year": y,
"shortyear": y % 100,
"month": dts.month,
"day": dts.day,
"hour": dts.hour,
"hour12": 12 if h in (0, 12) else (h % 12),
"ampm": loc_s.pm if (h // 12) else loc_s.am,
"min": dts.min,
"sec": dts.sec,
"us": dts.us,
}
else:
ts = Timestamp(val, tz=tz)

# Use string formatting for faster strftime
result[i] = ts.fast_strftime(str_format, loc_s)
else:

ts = Timestamp(val, tz=tz)
if format is None:
# Use datetime.str, that returns ts.isoformat(sep=' ')
result[i] = str(ts)
else:

Expand All @@ -179,6 +236,7 @@ def format_array_from_datetime(
try:
result[i] = ts.strftime(format)
except ValueError:
# Use datetime.str, that returns ts.isoformat(sep=' ')
result[i] = str(ts)

return result
Expand Down
6 changes: 6 additions & 0 deletions pandas/_libs/tslibs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
"OutOfBoundsTimedelta",
"IncompatibleFrequency",
"Period",
"convert_strftime_format",
"UnsupportedStrFmtDirective",
"Resolution",
"Timedelta",
"normalize_i8_timestamps",
Expand Down Expand Up @@ -48,6 +50,10 @@
IncompatibleFrequency,
Period,
)
from pandas._libs.tslibs.strftime import (
UnsupportedStrFmtDirective,
convert_strftime_format,
)
from pandas._libs.tslibs.timedeltas import (
Timedelta,
delta_to_nanoseconds,
Expand Down
4 changes: 3 additions & 1 deletion pandas/_libs/tslibs/offsets.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1548,8 +1548,10 @@ cdef class BusinessHour(BusinessMixin):

def _repr_attrs(self) -> str:
out = super()._repr_attrs()
# Use python string formatting to be faster than strftime
# f'{st.strftime("%H:%M")}-{en.strftime("%H:%M")}'
hours = ",".join(
f'{st.strftime("%H:%M")}-{en.strftime("%H:%M")}'
f'{st.hour:02d}:{st.minute:02d}-{en.hour:02d}:{en.minute:02d}'
for st, en in zip(self.start, self.end)
)
attrs = [f"{self._prefix}={hours}"]
Expand Down
1 change: 1 addition & 0 deletions pandas/_libs/tslibs/period.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ class Period:
def _from_ordinal(cls, ordinal: int, freq) -> Period: ...
@classmethod
def now(cls, freq: BaseOffset = ...) -> Period: ...
def fast_strftime(self, fmt_str: str, loc_s: object) -> str: ...
def strftime(self, fmt: str) -> str: ...
def to_timestamp(
self,
Expand Down
Loading