Skip to content

to_datetime, inconsistent behavior with invalid dates. #10154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vincentdavis opened this issue May 16, 2015 · 12 comments
Closed

to_datetime, inconsistent behavior with invalid dates. #10154

vincentdavis opened this issue May 16, 2015 · 12 comments
Labels
Bug Datetime Datetime data dtype
Milestone

Comments

@vincentdavis
Copy link
Contributor

Consider Feb 29 1991 (2291991) and March 32 1991 (3321991), Both are invalid dates.

pd.to_datetime(2291991, format="%m%d%Y", coerce=True, exact=True)

Returns a TypeError: ValueError: day is out of range for month

While

pd.to_datetime(3321991, format="%m%d%Y", coerce=True, exact=True)

Returns NaT. Which is what I would expect.

In any case they should return the same error or value

Joris Van den Bossche pointed out on the mailing list:
"It has something to do with the number of the day, only values above 31 convert to NaT, 31 or lower raises the error (eg also 31 April raise error instead of giving NaT)"

@jorisvandenbossche
Copy link
Member

To show it with a little bit simpler example (as the format you show can be a bit ambiguous depending on the month/day):

In [3]: pd.to_datetime('2015-02-29', coerce=True)
Out[3]: NaT

In [4]: pd.to_datetime('2015-02-29', format="%Y-%m-%d", coerce=True)
---------------------------------------------------------------------------
ValueError: day is out of range for month

In [5]: pd.to_datetime('2015-03-32', format="%Y-%m-%d", coerce=True)
Out[5]: NaT

In [6]: pd.to_datetime('2015-02-32', format="%Y-%m-%d", coerce=True)
Out[6]: NaT

In [7]: pd.to_datetime('2015-04-31', format="%Y-%m-%d", coerce=True)
---------------------------------------------------------------------------
ValueError: day is out of range for month

So it is for the code-path with using a specified format, there is a difference in handling out of range days of > 31 and <=31.

@jreback jreback added this to the Next Major Release milestone May 18, 2015
@jreback
Copy link
Contributor

jreback commented May 18, 2015

@vincentdavis this is pretty straightforward, just need to the catch the exception and if coerce=True then return `NaT. This is done intslib.pyx/array_strptime``

@vincentdavis
Copy link
Contributor Author

I would like to help but I am having problems getting started. How to you recommend setting up a development environment for pandas, are there docs on how to do this. What needs to be installed?
I am kinda getting lost in the code and might need some help.

@jreback
Copy link
Contributor

jreback commented May 26, 2015

http://pandas.pydata.org/pandas-docs/stable/contributing.html

the actual code to look at is in pandas/tslib.pyx; look in array_strptime

@vincentdavis vincentdavis changed the title to_datetime, Inconsistant behaviour with invalid dates. to_datetime, inconsistent behavior with invalid dates. May 27, 2015
@vincentdavis
Copy link
Contributor Author

Do you want a test for to_datetime() or array_strptime() my vote is to_datetime() as it is really about obeying coerce=True

@jreback
Copy link
Contributor

jreback commented May 27, 2015

your example cases can serve as tests

@jreback jreback modified the milestones: 0.17.0, Next Major Release May 27, 2015
@vincentdavis
Copy link
Contributor Author

In this case I was expecting a value error
In [4]: pd.to_datetime('2015-02-29', coerce=False)
Out[4]: '2015-02-29'

@jorisvandenbossche
Copy link
Member

@vincentdavis te default for errors is 'ignore'. If you do pd.to_datetime('2015-02-29', coerce=False, errors='raise') this will raise a ValueError

@vincentdavis
Copy link
Contributor Author

@jorisvandenbossche
This looks correct

In [10]: pd.to_datetime('2015-02-29', errors='ignore',  coerce=False)
Out[10]: '2015-02-29'

I would expect the same from this but get a ValueError.

In [12]: pd.to_datetime('2015-02-29', errors='ignore', format="%Y-%m-%d", coerce=False)
....
ValueError: day is out of range for month

adding format="%Y-%m-%d" should not change the output.
Is this correct and also needs to be fixed?

@jorisvandenbossche
Copy link
Member

@vincentdavis yes, I think you are correct. Although I don't really like this default of not raising but returning back the original string, so I more like the behaviour of when providing format which raises. But for consistency it should probably be changed ... (@jreback ?)

@jreback
Copy link
Contributor

jreback commented Jun 1, 2015

with errors='ignore' you won't get errors, so this is correct. This was briefly looked at in #8894.

I think it would be nice to change the default (let's make a separate issue for that though).

@jreback
Copy link
Contributor

jreback commented Jul 7, 2015

closed by #10520

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
3 participants