-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: out-of-bounds pydatetime with errors='coerce' and non-ISO format coerces all elements to NaT #50265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2ac0725
to
a8eff6c
Compare
4a32933
to
d60b631
Compare
"format,warning", [(None, UserWarning), ("H%:M%:S%", None)] | ||
"format,warning", [(None, UserWarning), ("%H:%M:%S", None)] | ||
) | ||
def test_datetime_invalid_scalar(self, value, format, warning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test claims to be testing invalid scalars, but in reality it was erroring before the parsing even started because "H%:M%:S%"
is an invalid format
So, I've changed the format to reflect what it looks like this test was meant to be testing, and have made a separate test above to check that invalid formats raise
@@ -1098,7 +1107,7 @@ def test_datetime_invalid_scalar(self, value, format, warning): | |||
|
|||
@pytest.mark.parametrize("value", ["3000/12/11 00:00:00"]) | |||
@pytest.mark.parametrize( | |||
"format,warning", [(None, UserWarning), ("H%:M%:S%", None)] | |||
"format,warning", [(None, UserWarning), ("%H:%M:%S", None)] | |||
) | |||
def test_datetime_outofbounds_scalar(self, value, format, warning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above (this errors before even reaching the out-of-bounds scalar...)
found = format_regex.search(val) | ||
if not found: | ||
if is_coerce: | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the code inside this try/except just indenting whats currently there or are there other changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's basically just that, but I also removed some
if is_coerce:
iresult[i] = NPY_NAT
continue
as that already happens in the outer try-except
If you hide whitespace it'll be easier to see - you can also go to https://github.com/pandas-dev/pandas/pull/50265/files?diff=unified&w=1 (notice the ?diff=unified&w=1
at the end) to see it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I'm not a big fan of try/excepts with a huge block of code inside, but I guess since the failures modes here should be made consistent with the broader to_datetime(errors=)
keyword I suppose this is fine.
Question, sometime we are issues some ValueError
s internally in this function that will be caught by this try except e.g.
if iso_year != -1:
if iso_week == -1 or weekday == -1:
raise ValueError("ISO year directive '%G' must be used with "
"the ISO week directive '%V' and a weekday "
"directive '%A', '%a', '%w', or '%u'.")
Is this necessarily a "parsing" error that should be influenced by the errors keyword?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rewrite it such that each check_dts_bounds
is within its own try-except - maybe that's better actually, thanks
I'll get back to you on that ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, refactored
That's a really good point about the "ISO year directive" error, thanks, good catch! So, I think that one should always raise. At least, that seems to be the current intention, there's no if is_coerce
check there like there is for some other errors, such as
pandas/pandas/_libs/tslibs/strptime.pyx
Lines 378 to 382 in a5cbd1e
except ValueError: | |
if is_coerce: | |
iresult[i] = NPY_NAT | |
continue | |
raise |
So, I've parametrised test_error_iso_week_year
too over errors
to check that this one actually raises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not actually opposed to the large try/except block if we expect ValueError and OutofBounds Errors to be affected by errors
. If other excepts in this functions can be factored out (like the iso year directive one) and others then it would help ensure errors is consistently handled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK thanks - have factored out the iso year directive check in #50309, so I've gone back to having an outer try-except block (so again, "hide-whitespace" will come in handy when reviewing)
22f4dde
to
2f9094f
Compare
@jbrockmendel @mroeschke sorry for the ping, just wanted to ask if we could get this one in, as:
That would conclude the major Thanks 🙌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.I haven't added a whatsnew note for #50255 , as this feature (parsing pydatetime / datetime / np.datetime64) objects with non-ISO formats wasn't available to users in the previous release anyway
This is much easier to review with "hide whitespace", as many of the changes are just changing a level of indentation