Skip to content

API: combine 'coerce' and 'errors' keywords of to_datetime #10653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jul 22, 2015 · 11 comments
Closed

API: combine 'coerce' and 'errors' keywords of to_datetime #10653

jorisvandenbossche opened this issue Jul 22, 2015 · 11 comments
Labels
API Design Datetime Datetime data dtype
Milestone

Comments

@jorisvandenbossche
Copy link
Member

From #10636

When parsing an invalid date, there are some possible different actions to take:

  • ignore -> just returning the original value unparsed (errors='ignore')
  • raise an error (errors='raise')
  • coerce / convert to NaN (coerce=True)

AFAIK, these three options are just three different options and mutually exclusive (or is there somewhere in the codebase an interaction between both?), so they could be controlled by a single keyword.
Having two keyword can make it even confusing what you should expect when combining those two, eg the example of coerce=True and error=raise.

Proposal: add another 'coerce' option to the errors keyword that is equivalent to coerce=True

-> this gives the errors keyword three options: 'ignore'|'raise'|'coerce'

@kawochen
Copy link
Contributor

You are probably aware but let me just point out that coerce=True is a stronger guarantee because not everything can be returned unparsed.

@jorisvandenbossche
Copy link
Member Author

@kawochen What do you mean exactly?

@jreback
Copy link
Contributor

jreback commented Jul 23, 2015

I am +1 on this

@kawochen
Copy link
Contributor

@jorisvandenbossche I meant that when you want more exception coverage you use coerce=True

>>> to_datetime(11111111111111111111111111, errors='ignore', coerce=False)
Traceback (most recent call last):
...
OverflowError: long too big to convert
>>> to_datetime(11111111111111111111111111, errors='ignore', coerce=True)
NaT

@jorisvandenbossche
Copy link
Member Author

ah, I see, but I would maybe rather consider that a bug (or undocumented issue) in that it should not raise an error but give the original value back? (that's what errors='ignore'/'raise' are for)

Are there cases than too big integers where something like this happens?

@kawochen
Copy link
Contributor

There are cases when even coerce=True doesn't return NaT upon error.

>>> to_datetime([[],1], errors='ignore', coerce=True)
DatetimeIndex(['NaT', 'NaT'], dtype='datetime64[ns]', freq=None, tz=None)
>>> to_datetime([[]], errors='ignore', coerce=True)
Traceback (most recent call last):
...
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

And cases when coerce=True returns NaT but coerce=False returns a Timestamp

>>> to_datetime(-1, errors='ignore', coerce=True)
NaT
>>> to_datetime(-1, errors='ignore', coerce=False)
Timestamp('1969-12-31 23:59:59.999999999')

@jreback
Copy link
Contributor

jreback commented Jul 23, 2015

@kawochen those are almost certainly errors, because of the confusion on the cython side. E.g. the errors='ignore', coerce=True is just odd. No real need for it. Either you get an exception (default), or you want coercion in which case you always get NaT for invalid parseables.

@kawochen
Copy link
Contributor

Sure. I just wanted to see what breaks. I have no legit use cases for them

@jorisvandenbossche
Copy link
Member Author

@kawochen yes, and that is very useful! But indeed, as @jreback says, it are more bugs in the current implementation. Do you want to open a separate issue for those?

@jreback the only reason I would be a bit hesitant for this change, is that we just changed it in convert_objects the other way around (from string convert_dates='coerce') to aboolean coerce=True), which is not really consistent..
But OK, I think for to_datetime this is the best option (it would be simpler when the 'ignore' option did not exist, you could simply have coerce=True/False and raise error if False, but nothing to do about that anymore).

@jreback
Copy link
Contributor

jreback commented Jul 27, 2015

@jorisvandenbossche
cc @bashtage

I think that it would be ok to change .convert_objects() to errors='ignore'|'coerce' for consistency. (was just change in master to ok on this)

@jreback
Copy link
Contributor

jreback commented Aug 26, 2015

this looks like a dupe issue of #10636 and hence is closed by #10674

@kawochen if you see some bugs pls open a new issue for them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Datetime Datetime data dtype
Projects
None yet
Development

No branches or pull requests

3 participants