Skip to content

BUG: Invalid format to to_datetime with errors='coerce' doesn't raise #50266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
MarcoGorelli opened this issue Dec 15, 2022 · 0 comments · Fixed by #50265
Closed
3 tasks done

BUG: Invalid format to to_datetime with errors='coerce' doesn't raise #50266

MarcoGorelli opened this issue Dec 15, 2022 · 0 comments · Fixed by #50265
Labels
Bug Datetime Datetime data dtype

Comments

@MarcoGorelli
Copy link
Member

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

In [2]: pd.to_datetime("00:01:18", format='H%:M%:S%')  # raises, as expected
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/pandas-dev/pandas/_libs/tslibs/strptime.pyx:136, in pandas._libs.tslibs.strptime.array_strptime()
    135 try:
--> 136     format_regex = _TimeRE_cache.compile(fmt)
    137 # KeyError raised when a bad format is found; can be specified as

File ~/mambaforge/envs/pandas-dev/lib/python3.8/_strptime.py:263, in TimeRE.compile(self, format)
    262 """Return a compiled re object for the format string."""
--> 263 return re_compile(self.pattern(format), IGNORECASE)

File ~/mambaforge/envs/pandas-dev/lib/python3.8/_strptime.py:257, in TimeRE.pattern(self, format)
    254 directive_index = format.index('%')+1
    255 processed_format = "%s%s%s" % (processed_format,
    256                                format[:directive_index-1],
--> 257                                self[format[directive_index]])
    258 format = format[directive_index+1:]

File ~/pandas-dev/pandas/_libs/tslibs/strptime.pyx:461, in pandas._libs.tslibs.strptime.TimeRE.__getitem__()
    460     return self._Z
--> 461 return super().__getitem__(key)
    462 

KeyError: ':'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[2], line 1
----> 1 pd.to_datetime("00:01:18", format='H%:M%:S%')

File ~/pandas-dev/pandas/core/tools/datetimes.py:1100, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
   1098         result = convert_listlike(argc, format)
   1099 else:
-> 1100     result = convert_listlike(np.array([arg]), format)[0]
   1101     if isinstance(arg, bool) and isinstance(result, np.bool_):
   1102         result = bool(result)  # TODO: avoid this kludge.

File ~/pandas-dev/pandas/core/tools/datetimes.py:442, in _convert_listlike_datetimes(arg, format, name, utc, unit, errors, dayfirst, yearfirst, exact)
    439 require_iso8601 = format is not None and format_is_iso(format)
    441 if format is not None and not require_iso8601:
--> 442     return _to_datetime_with_format(
    443         arg,
    444         orig_arg,
    445         name,
    446         utc,
    447         format,
    448         exact,
    449         errors,
    450     )
    452 result, tz_parsed = objects_to_datetime64ns(
    453     arg,
    454     dayfirst=dayfirst,
   (...)
    461     exact=exact,
    462 )
    464 if tz_parsed is not None:
    465     # We can take a shortcut since the datetime64 numpy array
    466     # is in UTC

File ~/pandas-dev/pandas/core/tools/datetimes.py:543, in _to_datetime_with_format(arg, orig_arg, name, utc, fmt, exact, errors)
    540         return _box_as_indexlike(result, utc=utc, name=name)
    542 # fallback
--> 543 res = _array_strptime_with_fallback(arg, name, utc, fmt, exact, errors)
    544 return res

File ~/pandas-dev/pandas/core/tools/datetimes.py:485, in _array_strptime_with_fallback(arg, name, utc, fmt, exact, errors)
    481 """
    482 Call array_strptime, with fallback behavior depending on 'errors'.
    483 """
    484 try:
--> 485     result, timezones = array_strptime(
    486         arg, fmt, exact=exact, errors=errors, utc=utc
    487     )
    488 except OutOfBoundsDatetime:
    489     if errors == "raise":

File ~/pandas-dev/pandas/_libs/tslibs/strptime.pyx:126, in pandas._libs.tslibs.strptime.array_strptime()
    124 
    125     global _TimeRE_cache, _regex_cache
--> 126     with _cache_lock:
    127         if _getlang() != _TimeRE_cache.locale_time.lang:
    128             _TimeRE_cache = TimeRE()

File ~/pandas-dev/pandas/_libs/tslibs/strptime.pyx:144, in pandas._libs.tslibs.strptime.array_strptime()
    142         bad_directive = "%"
    143     del err
--> 144     raise ValueError(f"'{bad_directive}' is a bad directive "
    145                      f"in format '{fmt}'")
    146 # IndexError only occurs when the format string is "%"

ValueError: ':' is a bad directive in format 'H%:M%:S%'

In [3]: pd.to_datetime("00:01:18", format='H%:M%:S%', errors='coerce')  # doesn't raise
Out[3]: NaT

Issue Description

The issue with this overly broad try-except:

try:
result, timezones = array_strptime(
arg, fmt, exact=exact, errors=errors, utc=utc
)
except OutOfBoundsDatetime:
if errors == "raise":
raise
if errors == "coerce":
result = np.empty(arg.shape, dtype="M8[ns]")
iresult = result.view("i8")
iresult.fill(iNaT)
else:
result = arg
except ValueError:
if errors == "raise":
raise
if errors == "coerce":
result = np.empty(arg.shape, dtype="M8[ns]")
iresult = result.view("i8")
iresult.fill(iNaT)
else:
result = arg

The try-except should happen within the loop, as is done in tslib.pyx:

for i in range(n):
val = values[i]
try:

Related: #24763, but I really disagree that this is desirable behaviour

Expected Behavior

pd.to_datetime("00:01:18", format='H%:M%:S%', errors='coerce') should still raise

errors='coerce' controls what happens during parsing - here, however, the error happens before the parsing even begins, because the format is invalid

Installed Versions

INSTALLED VERSIONS

commit : a5cbd1e
python : 3.8.15.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.102.1-microsoft-standard-WSL2
Version : #1 SMP Wed Mar 2 00:30:59 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 2.0.0.dev0+923.ga5cbd1e52e
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.5.1
pip : 22.3.1
Cython : 0.29.32
pytest : 7.2.0
hypothesis : 6.61.0
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.9.1
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.5
brotli :
fastparquet : 2022.12.0
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.6.2
numba : 0.56.4
numexpr : 2.8.3
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : 1.2.0
pyxlsb : 1.0.10
s3fs : 2021.11.0
scipy : 1.9.3
snappy :
sqlalchemy : 1.4.45
tables : 3.7.0
tabulate : 0.9.0
xarray : 2022.12.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : None
qtpy : None
pyqt5 : None
None

@MarcoGorelli MarcoGorelli added Bug Needs Triage Issue that has not been reviewed by a pandas team member Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
1 participant