-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: catch out-of-bounds datetime64 in Series/DataFrame constructor #26848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: catch out-of-bounds datetime64 in Series/DataFrame constructor #26848
Conversation
Codecov Report
@@ Coverage Diff @@
## master #26848 +/- ##
==========================================
- Coverage 91.97% 91.97% -0.01%
==========================================
Files 180 180
Lines 50756 50760 +4
==========================================
+ Hits 46685 46686 +1
- Misses 4071 4074 +3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26848 +/- ##
==========================================
- Coverage 91.86% 91.85% -0.01%
==========================================
Files 179 179
Lines 50700 50703 +3
==========================================
- Hits 46576 46575 -1
- Misses 4124 4128 +4
Continue to review full report at Codecov.
|
Because we have been doing the opposite for some releases? Before 0.23.0, all of Index, Series and DataFrame constructors have been consistently raising an error when out-of-bounds datetime64 array was passed, even if no explicit dtype was specified. So yes, I consider converting to object an option (this is why I raised the question in the top post), but it's not necessarily the "only correct" way. For example, for Index, this would be a change in behaviour. |
this is likely a bug / regression Series constructor was designed to not raise and rather coerce |
I suppose you don't mean "coerce" here in meaning of |
def test_constructor_datetime64_outofbound(self): | ||
# GH-26206 out of bound non-ns unit | ||
with pytest.raises(pd.errors.OutOfBoundsDatetime): | ||
pd.Series(np.array(['2262-04-12'], dtype='datetime64[D]')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also test the minimum date as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's relying on a general routine that does this bound checking and which is already tested thoroughly through other means, here I mainly want to test to ensure this routine is used. Therefore, I decided to keep the tests here a bit simpler (adding is not that hard, but it just complicates the test further, and there are already several cases). Is that OK?
…datetime-unit-out-of-bounds
Hello @jorisvandenbossche! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-06-21 06:16:22 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code change looks good (ideally we want to simplify that code, but that is non-trivial / out-of scope for this). can you add a whatsnew note (I think single line is ok, not sure we need a sub-section to explain, though ok for that too)
…datetime-unit-out-of-bounds
One question about the tests: I want to add the exact same for DataFrame and Index constructors. I can copy an edited version to the respective testing files for DataFrame and Index, but on the other hand, I think there is also benefit in keeping the test together (less duplication of the boiler plate around it, could do a parametrize). Do we have a place to put general constructor tests that check for consistency? (because doing it here in the series tests is also a bit out of place) |
we have 2 mechanisms for this, neither complete solutions. test_generic you can do Series/DataFrame and test_base you can do Series/Index. generally for this as its really the Series you are fixing is to put it in test_base and then copy a similar example to the appropriate palace in frame tests. |
My last commit does an example parametrize of Index/DataFrame/Series
yes, but that also fixes DataFrame (with passing a dict) as it is using the same code. For Index it is indeed only adding tests to ensure consistency. |
Will move to test_base, that seems most appropriate (it indeed already has some tests that checks consistency across Series, Index) |
What both the dataframe and series fix have in common is that the code (where the fixes are applied) is now centralized in |
This should be ready. I opened a new issue for the xfail DataFrame case that I added, as this is a more general issue that we are doing unsafe astype: #26919 |
pls merge master. lgtm. merge on green. |
git diff upstream/master -u -- "*.py" | flake8 --diff
TODO: Need to add a whatsnew and test for DataFrame.
Question:
I am now letting this raise an error. But, an alternative could be to return an object array (by simply removing the
except OutOfBoundsDatetime: raise
, as the code below that fall backs to object array if nodtype
is specified.This would be more consistent with if it is not an array but list of scalars:
That is somewhat the general logic of the
_try_cast
. Of course, if you specify the required dtype explicitly, you should still get an error (which is actually also a bug on master, I see now).However, that would be inconsistent with the current
pd.Index
behaviour, which raises when passing out of bounds datetime64 (but also returns object dtype with list of datetime64 scalars)cc @jbrockmendel @mroeschke