We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
In [2]: pd.to_datetime("00:01:18", format='H%:M%:S%') # raises, as expected --------------------------------------------------------------------------- KeyError Traceback (most recent call last) File ~/pandas-dev/pandas/_libs/tslibs/strptime.pyx:136, in pandas._libs.tslibs.strptime.array_strptime() 135 try: --> 136 format_regex = _TimeRE_cache.compile(fmt) 137 # KeyError raised when a bad format is found; can be specified as File ~/mambaforge/envs/pandas-dev/lib/python3.8/_strptime.py:263, in TimeRE.compile(self, format) 262 """Return a compiled re object for the format string.""" --> 263 return re_compile(self.pattern(format), IGNORECASE) File ~/mambaforge/envs/pandas-dev/lib/python3.8/_strptime.py:257, in TimeRE.pattern(self, format) 254 directive_index = format.index('%')+1 255 processed_format = "%s%s%s" % (processed_format, 256 format[:directive_index-1], --> 257 self[format[directive_index]]) 258 format = format[directive_index+1:] File ~/pandas-dev/pandas/_libs/tslibs/strptime.pyx:461, in pandas._libs.tslibs.strptime.TimeRE.__getitem__() 460 return self._Z --> 461 return super().__getitem__(key) 462 KeyError: ':' During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) Cell In[2], line 1 ----> 1 pd.to_datetime("00:01:18", format='H%:M%:S%') File ~/pandas-dev/pandas/core/tools/datetimes.py:1100, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache) 1098 result = convert_listlike(argc, format) 1099 else: -> 1100 result = convert_listlike(np.array([arg]), format)[0] 1101 if isinstance(arg, bool) and isinstance(result, np.bool_): 1102 result = bool(result) # TODO: avoid this kludge. File ~/pandas-dev/pandas/core/tools/datetimes.py:442, in _convert_listlike_datetimes(arg, format, name, utc, unit, errors, dayfirst, yearfirst, exact) 439 require_iso8601 = format is not None and format_is_iso(format) 441 if format is not None and not require_iso8601: --> 442 return _to_datetime_with_format( 443 arg, 444 orig_arg, 445 name, 446 utc, 447 format, 448 exact, 449 errors, 450 ) 452 result, tz_parsed = objects_to_datetime64ns( 453 arg, 454 dayfirst=dayfirst, (...) 461 exact=exact, 462 ) 464 if tz_parsed is not None: 465 # We can take a shortcut since the datetime64 numpy array 466 # is in UTC File ~/pandas-dev/pandas/core/tools/datetimes.py:543, in _to_datetime_with_format(arg, orig_arg, name, utc, fmt, exact, errors) 540 return _box_as_indexlike(result, utc=utc, name=name) 542 # fallback --> 543 res = _array_strptime_with_fallback(arg, name, utc, fmt, exact, errors) 544 return res File ~/pandas-dev/pandas/core/tools/datetimes.py:485, in _array_strptime_with_fallback(arg, name, utc, fmt, exact, errors) 481 """ 482 Call array_strptime, with fallback behavior depending on 'errors'. 483 """ 484 try: --> 485 result, timezones = array_strptime( 486 arg, fmt, exact=exact, errors=errors, utc=utc 487 ) 488 except OutOfBoundsDatetime: 489 if errors == "raise": File ~/pandas-dev/pandas/_libs/tslibs/strptime.pyx:126, in pandas._libs.tslibs.strptime.array_strptime() 124 125 global _TimeRE_cache, _regex_cache --> 126 with _cache_lock: 127 if _getlang() != _TimeRE_cache.locale_time.lang: 128 _TimeRE_cache = TimeRE() File ~/pandas-dev/pandas/_libs/tslibs/strptime.pyx:144, in pandas._libs.tslibs.strptime.array_strptime() 142 bad_directive = "%" 143 del err --> 144 raise ValueError(f"'{bad_directive}' is a bad directive " 145 f"in format '{fmt}'") 146 # IndexError only occurs when the format string is "%" ValueError: ':' is a bad directive in format 'H%:M%:S%' In [3]: pd.to_datetime("00:01:18", format='H%:M%:S%', errors='coerce') # doesn't raise Out[3]: NaT
The issue with this overly broad try-except:
pandas/pandas/core/tools/datetimes.py
Lines 484 to 505 in 1d5f05c
The try-except should happen within the loop, as is done in tslib.pyx:
tslib.pyx
pandas/pandas/_libs/tslib.pyx
Lines 528 to 531 in 6598797
Related: #24763, but I really disagree that this is desirable behaviour
pd.to_datetime("00:01:18", format='H%:M%:S%', errors='coerce') should still raise
pd.to_datetime("00:01:18", format='H%:M%:S%', errors='coerce')
errors='coerce' controls what happens during parsing - here, however, the error happens before the parsing even begins, because the format is invalid
errors='coerce'
commit : a5cbd1e python : 3.8.15.final.0 python-bits : 64 OS : Linux OS-release : 5.10.102.1-microsoft-standard-WSL2 Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8
pandas : 2.0.0.dev0+923.ga5cbd1e52e numpy : 1.23.5 pytz : 2022.6 dateutil : 2.8.2 setuptools : 65.5.1 pip : 22.3.1 Cython : 0.29.32 pytest : 7.2.0 hypothesis : 6.61.0 sphinx : 4.5.0 blosc : None feather : None xlsxwriter : 3.0.3 lxml.etree : 4.9.1 html5lib : 1.1 pymysql : 1.0.2 psycopg2 : 2.9.3 jinja2 : 3.1.2 IPython : 8.7.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : 1.3.5 brotli : fastparquet : 2022.12.0 fsspec : 2021.11.0 gcsfs : 2021.11.0 matplotlib : 3.6.2 numba : 0.56.4 numexpr : 2.8.3 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 9.0.0 pyreadstat : 1.2.0 pyxlsb : 1.0.10 s3fs : 2021.11.0 scipy : 1.9.3 snappy : sqlalchemy : 1.4.45 tables : 3.7.0 tabulate : 0.9.0 xarray : 2022.12.0 xlrd : 2.0.1 zstandard : 0.19.0 tzdata : None qtpy : None pyqt5 : None None
The text was updated successfully, but these errors were encountered:
Successfully merging a pull request may close this issue.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The issue with this overly broad try-except:
pandas/pandas/core/tools/datetimes.py
Lines 484 to 505 in 1d5f05c
The try-except should happen within the loop, as is done in
tslib.pyx
:pandas/pandas/_libs/tslib.pyx
Lines 528 to 531 in 6598797
Related: #24763, but I really disagree that this is desirable behaviour
Expected Behavior
pd.to_datetime("00:01:18", format='H%:M%:S%', errors='coerce')
should still raiseerrors='coerce'
controls what happens during parsing - here, however, the error happens before the parsing even begins, because the format is invalidInstalled Versions
INSTALLED VERSIONS
commit : a5cbd1e
python : 3.8.15.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 2.0.0.dev0+923.ga5cbd1e52e
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.0
hypothesis : 6.61.0
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 2022.12.0
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.6.2
numba : 0.56.4
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2021.11.0
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.45
tables : 3.7.0
tabulate : 0.9.0
xarray : 2022.12.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : None
qtpy : None
pyqt5 : None
None
The text was updated successfully, but these errors were encountered: