-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: handling of invalid datetimes in constructors #26853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree with your suggestion that |
Which means: handling out of bound datetime64 differently as out of bound datetime.datetime (I don't have a strong opinion here, we just need to make a choice) |
Yes this is the rational that for my opinion. Any thoughts @jbrockmendel |
I agree that this is the best we can do under the circumstances. |
@jreback thoughts here? On the PR I think you argued for converting to object, also for out of bound non-ns datetime64 |
i would agree that datetime.datetime should be object as indicated above (agree with everyone) slightl preference for converting np.datetime to datetime.datetime (and handling as object) I don’t like raising in the Series / Dataframe constructors bottom line though willing to go along with consensus (unless the above is easy) |
I am not sure the one or the other behaviour is necessarily harder to implement. I think we should rather decide on what behaviour we want. Are there other numpy dtypes we cannot represent? (to check what we do there?) I don't think there are? (except for np.str which we convert to object, but that's a bit a special case). |
Another inconsistency I noticed (but another case, handling of iNaT, thus the integer number):
|
Triggered by the regression reported in #26206 and my attempt to fix it (#26848), I looked a bit into how our different constructors handle different cases of invalid datetimes.
Out of bound datetimes (cases: a non-ns unit numpy datetime64 array, datetime.datetime objects, strings):
Some remarks here:
to_datetime
, all the OutOfBounds are expected, as the default is to error.Series(.., dtype='M8[ns])
is clearly broken (gives an incorrect date for all cases)The last item is the main question: when inferring (no dtype enforced) and in case the data are clearly timestamps (either numpy datetime64 or datetime.datetime), should we raise an error or return object dtype.
Note that for datetime.datetime, returning object dtype might be more logical as it means returning the input as is. While for datetime64[non-ns], it actually means converting a numpy dtype to object data.
The text was updated successfully, but these errors were encountered: