-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
to_datetime(foo, errors='coerce') does not swallow all errors #28299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yea that does seem buggy. I think should return NaT as well. Care to investigate? |
@WillAyd thanks - I'm happy to dig into the project to investigate, though I'm quite unfamiliar with the inner workings of pandas. Specifically, the error is raised the bottom of this block, from tslib.array_to_datetime: # pandas/core/arrays/datatime.py from line 1965, in objects_to_datetime64ns:
# if str-dtype, convert
data = np.array(data, copy=False, dtype=np.object_)
try:
result, tz_parsed = tslib.array_to_datetime(
data,
errors=errors,
utc=utc,
dayfirst=dayfirst,
yearfirst=yearfirst,
require_iso8601=require_iso8601,
)
except ValueError as e:
try:
values, tz_parsed = conversion.datetime_to_datetime64(data)
# If tzaware, these values represent unix timestamps, so we
# return them as i8 to distinguish from wall times
return values.view("i8"), tz_parsed
except (ValueError, TypeError):
raise e Simple reproducible example: import numpy as np
data = np.array(['200622-12-31'], copy=False, dtype=np.object_)
tslib.array_to_datetime(data, errors='coerce', utc=False, dayfirst=False, yearfirst=False, require_iso8601=False) It looks like the outer EDIT: corrected the reproducible example to include errors='coerce' |
(happy to raise a PR if the above suggestion seems sensible) EDIT: the above suggestion would not work at all. The error needs to be handled somewhere in the Cython internals (in |
After further investigation it looks like this is caused by this open issue in the dateutil project: To reproduce: In [6]: import dateutil.parser
In [7]: timestamp_value = '200622-12-31'
In [8]: timestamp = dateutil.parser.parse(timestamp_value)
In [10]: timestamp.utcoffset()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-a2bb0af2bddf> in <module>
----> 1 timestamp.utcoffset()
ValueError: offset must be a timedelta strictly between -timedelta(hours=24) and timedelta(hours=24). The error isn't raised on parsing, but only when retrieving the offset parsed from the string. This means that the block which attempts a try/except around parsing datetime strings in # from line 608
try:
py_dt = parse_datetime_string(val,
dayfirst=dayfirst,
yearfirst=yearfirst)
except Exception:
if is_coerce:
iresult[i] = NPY_NAT
continue
raise TypeError("invalid string coercion to "
"datetime")
# If the dateutil parser returned tzinfo, capture it
# to check if all arguments have the same tzinfo
tz = py_dt.utcoffset() Accessing of the |
Code Sample
Problem description
I have some text files with malformed dates, which at one point I will process with the above code. While trying to migrate my code from 23.4 to 25.1 I got the following:
Expected Output
The main expectation is that an exception is not raised.
I would probably expect
pandas.to_datetime('200622-12-31', errors='coerce')
to return NaT, but pandas 23.4 seems to parse it intoTimestamp('2022-06-21 19:00:00')
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-58-generic
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.10
pytest : 4.6.3
hypothesis : 4.7.3
sphinx : None
blosc : None
feather : 0.4.0
xlsxwriter : 1.1.8
lxml.etree : 4.3.3
html5lib : 1.0.1
pymysql : 0.9.3
psycopg2 : 2.7.7 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : 0.3.0
lxml.etree : 4.3.3
matplotlib : 3.1.1
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : 1.2.14
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
The text was updated successfully, but these errors were encountered: