Skip to content

Commit 99859e4

Browse files
used regular expression in format_is_iso() (#50468)
* performace: used regular expression 50465 * added comments for the code * fixed broken comment * verbose regex * added more seperators * working excluding format * test passed * one seperator in a condition * removed unnecessary OR statements * used backreferencing * minor bug * modified version * non regex excluded format * Update pandas/_libs/tslibs/strptime.pyx * Update pandas/_libs/tslibs/strptime.pyx * Update pandas/_libs/tslibs/strptime.pyx * added whatsnew Co-authored-by: Shashwat <shashwat> Co-authored-by: Marco Edward Gorelli <[email protected]>
1 parent 184e167 commit 99859e4

File tree

2 files changed

+19
-8
lines changed

2 files changed

+19
-8
lines changed

doc/source/whatsnew/v2.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ Other enhancements
103103
- :meth:`DataFrame.plot.hist` now recognizes ``xlabel`` and ``ylabel`` arguments (:issue:`49793`)
104104
- Improved error message in :func:`to_datetime` for non-ISO8601 formats, informing users about the position of the first error (:issue:`50361`)
105105
- Improved error message when trying to align :class:`DataFrame` objects (for example, in :func:`DataFrame.compare`) to clarify that "identically labelled" refers to both index and columns (:issue:`50083`)
106+
- Performance improvement in :func:`to_datetime` when format is given or can be inferred (:issue:`50465`)
106107
-
107108

108109
.. ---------------------------------------------------------------------------

pandas/_libs/tslibs/strptime.pyx

+18-8
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ from cpython.datetime cimport (
1414
import_datetime()
1515

1616
from _thread import allocate_lock as _thread_allocate_lock
17+
import re
1718

1819
import numpy as np
1920
import pytz
@@ -50,6 +51,7 @@ from pandas._libs.util cimport (
5051
is_float_object,
5152
is_integer_object,
5253
)
54+
5355
from pandas._libs.tslibs.timestamps import Timestamp
5456

5557
cnp.import_array()
@@ -60,15 +62,23 @@ cdef bint format_is_iso(f: str):
6062
Generally of form YYYY-MM-DDTHH:MM:SS - date separator can be different
6163
but must be consistent. Leading 0s in dates and times are optional.
6264
"""
65+
iso_regex = re.compile(
66+
r"""
67+
^ # start of string
68+
%Y # Year
69+
(?:([-/ \\.]?)%m # month with or without separators
70+
(?: \1%d # day with same separator as for year-month
71+
(?:[ T]%H # hour with separator
72+
(?:\:%M # minute with separator
73+
(?:\:%S # second with separator
74+
(?:%z|\.%f(?:%z)? # timezone or fractional second
75+
)?)?)?)?)?)? # optional
76+
$ # end of string
77+
""",
78+
re.VERBOSE,
79+
)
6380
excluded_formats = ["%Y%m"]
64-
65-
for date_sep in [" ", "/", "\\", "-", ".", ""]:
66-
for time_sep in [" ", "T"]:
67-
for micro_or_tz in ["", "%z", ".%f", ".%f%z"]:
68-
iso_fmt = f"%Y{date_sep}%m{date_sep}%d{time_sep}%H:%M:%S{micro_or_tz}"
69-
if iso_fmt.startswith(f) and f not in excluded_formats:
70-
return True
71-
return False
81+
return re.match(iso_regex, f) is not None and f not in excluded_formats
7282

7383

7484
def _test_format_is_iso(f: str) -> bool:

0 commit comments

Comments
 (0)