Skip to content

DOC/ERR: better error message on unsuccessful datetime parsing #10720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jreback opened this issue Aug 1, 2015 · 14 comments
Open

DOC/ERR: better error message on unsuccessful datetime parsing #10720

jreback opened this issue Aug 1, 2015 · 14 comments
Labels
Datetime Datetime data dtype Enhancement Error Reporting Incorrect or improved errors from pandas

Comments

@jreback
Copy link
Contributor

jreback commented Aug 1, 2015

exceptions that are raised on unsuccessful datetime/timedelta parsing should add this:

you can coerce to NaT by passing errors='coerce'

comment at the end: #10674

@jreback jreback added Datetime Datetime data dtype Difficulty Novice Error Reporting Incorrect or improved errors from pandas Timedelta Timedelta data type labels Aug 1, 2015
@jreback jreback added this to the Next Major Release milestone Aug 1, 2015
@springcoil
Copy link
Contributor

I was looking at this today, what actually does need to be changed in the code? I'm having trouble understanding the logic that would need this error? Does anyone have an example?

@jorisvandenbossche
Copy link
Member

The idea is that when an error is raised by to_datetime (all different cases when a the string cannot be parsed), you get an additional message saying that you can use errors='coerce' to coerce to NaT and in this way suppress the error.

E.g.:

In [6]: pd.to_datetime('something', errors='raise')

ValueError: Unknown string format

could say something like "ValueError: Unknown string format. You can coerce errors to NaT by passing errors='coerce'"

@jreback
Copy link
Contributor Author

jreback commented Aug 11, 2015

@springcoil so there are lots of tests that assert errors using to_datetime. Ideallly would go thru those and see what they produce, and fix those that are not either context sensitive (e.g. maybe can give a more informative message), and also add that you can pass error='coerce' to get a NaT if desired.

most of these tests are in tseries/tests/test_timeseries.py

@baevpetr
Copy link
Contributor

Hello, can I participate here ?

@jbrockmendel
Copy link
Member

@baevpetr go for it

@baevpetr
Copy link
Contributor

take

@baevpetr
Copy link
Contributor

I want to clarify: after calling pd.to_datetime('some_nonsense', errors='raise') we get:

Traceback (most recent call last):
  File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1979, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)
  File "pandas/_libs/tslibs/conversion.pyx", line 200, in pandas._libs.tslibs.conversion.datetime_to_datetime64
    raise TypeError(f'Unrecognized value type: {type(val)}')
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-18-3148d274c504>", line 1, in <module>
    pd.to_datetime('some_nonsense', errors='raise')
  File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
    return func(*args, **kwargs)
  File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 796, in to_datetime
    result = convert_listlike(np.array([arg]), box, format)[0]
  File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 463, in _convert_listlike_datetimes
    allow_object=True,
  File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1984, in objects_to_datetime64ns
    raise e
  File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1975, in objects_to_datetime64ns
    require_iso8601=require_iso8601,
  File "pandas/_libs/tslib.pyx", line 465, in pandas._libs.tslib.array_to_datetime
    1) datetime64[ns] data
  File "pandas/_libs/tslib.pyx", line 688, in pandas._libs.tslib.array_to_datetime
    if is_coerce:
  File "pandas/_libs/tslib.pyx", line 822, in pandas._libs.tslib.array_to_datetime_object
    return oresult, None
  File "pandas/_libs/tslib.pyx", line 813, in pandas._libs.tslib.array_to_datetime_object
    oresult[i] = <object>NaT
  File "pandas/_libs/tslibs/parsing.pyx", line 225, in pandas._libs.tslibs.parsing.parse_datetime_string
    try:
  File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1358, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 649, in parse
    raise ValueError("Unknown string format:", timestr)
ValueError: ('Unknown string format:', 'some_nonsense')

ValueError raised in dateutil/parser/_parser.py (line 649):

        if res is None:
            raise ValueError("Unknown string format:", timestr)

        if len(res) == 0:
            raise ValueError("String does not contain a date:", timestr)

and we catch it in pandas/core/arrays/datetimes.py (line 1968):

    try:
        result, tz_parsed = tslib.array_to_datetime(
            data,
            errors=errors,
            utc=utc,
            dayfirst=dayfirst,
            yearfirst=yearfirst,
            require_iso8601=require_iso8601,
        )
    except ValueError as e:
        try:
            values, tz_parsed = conversion.datetime_to_datetime64(data)
            # If tzaware, these values represent unix timestamps, so we
            #  return them as i8 to distinguish from wall times
            return values.view("i8"), tz_parsed
        except (ValueError, TypeError):
            raise e

For now I see variants:

  1. Append 'you can coerce to NaT by passing errors='coerce'' for both of ValueErrors.
  2. Distinguish them based on the message.
  3. *fix my assumption if I talking some nonsense.

@jbrockmendel
Copy link
Member

@baevpetr without looking at it too closely, I'm tentatively ruling out variant 3.

As the person volunteering to put in the time to improve this, you get to choose your preferred approach.

@baevpetr
Copy link
Contributor

31ead07

@baevpetr
Copy link
Contributor

Just ping @jbrockmendel or @jreback or @jorisvandenbossche.

@baevpetr
Copy link
Contributor

Ping you guys one more time)
@jbrockmendel or @jreback or @jorisvandenbossche.

@jreback
Copy link
Contributor Author

jreback commented Dec 27, 2019

@baevpetr if you want to put up a PR much easier to see and comment

@ShaharNaveh
Copy link
Member

take

@ShaharNaveh ShaharNaveh removed their assignment Mar 18, 2020
@mroeschke mroeschke added Enhancement and removed Timedelta Timedelta data type labels May 3, 2020
@TomAugspurger TomAugspurger modified the milestones: 1.1, Contributions Welcome Jun 18, 2020
@erfannariman
Copy link
Member

erfannariman commented Jul 2, 2020

@MomIsBestFriend for my understanding, why did you close your PR? It looked like decent changes for this ticket. Was there anything unclear for you? I can help out if you want if time is the issue for you.

@baevpetr baevpetr removed their assignment Feb 15, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement Error Reporting Incorrect or improved errors from pandas
Projects
None yet
9 participants