-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Make Series[datetime64] - pd.NaT behave like DatetimeIndex - pd.NaT #18960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/ops.py
Outdated
@@ -407,8 +407,12 @@ def _validate_datetime(self, lvalues, rvalues, name): | |||
|
|||
# if tz's must be equal (same or None) | |||
if getattr(lvalues, 'tz', None) != getattr(rvalues, 'tz', None): | |||
raise ValueError("Incompatible tz's on datetime subtraction " | |||
"ops") | |||
if len(rvalues) == 1 and np.isnat(rvalues[0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we never use np.isnat
, always use isna
.
why is this restricted to len(..) == 1? if they are all nat should be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this restricted to len(..) == 1?
Because the bug being addressed is specific to scalar NaT subtraction. Non-scalar cases are handled elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we never use np.isnat, always use isna.
I don't have a strong opinion. np.isnat is more strict, which I figure is preferable since all else equal I'd like to avoid catching extra cases by accident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well I do. don't use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You got it buddy. Just changed and pushed.
@@ -505,11 +509,20 @@ def _convert_to_array(self, values, name=None, other=None): | |||
inferred_type = lib.infer_dtype(values) | |||
if (inferred_type in ('datetime64', 'datetime', 'date', 'time') or | |||
is_datetimetz(inferred_type)): | |||
|
|||
if ovalues is pd.NaT and name == '__sub__': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong this this code that is existing? (obviously the dtype is incorrect if other is dt64 but otherwise this is fine)
# if we have a other of timedelta, but use pd.NaT here we
# we are in the wrong path
if (supplied_dtype is None and other is not None and
(other.dtype in ('timedelta64[ns]', 'datetime64[ns]')) and
isna(values).all()):
values = np.empty(values.shape, dtype='timedelta64[ns]')
values[:] = iNaT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For starters the other.dtype condition excludes datetime64s with timezones. More generally, this PR is specifically about scalar subtraction of pd.NaT and I would rather give that its own block than try to shoe-horn it into an existing block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplify this like the above code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. But in my head I'm going to put sarcasm quotes around the word "simplify".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue. you are repeating code. you only need to select the dtype if other.dtype == 'timedelta64[ns]'
but the construction is regardless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you're suggesting would catch np.array([np.datetime64('NaT')])
which is unambiguously datetimelike, or consider np.array([pd.NaT, datetime64('NaT')])
. _TimeOp
is a mess in part because it is not clear about what cases it is trying to handle where. This PR is trying to fix a bug tied to one very specific case. The fix should not catch any cases other than that specific intended case.
Codecov Report
@@ Coverage Diff @@
## master #18960 +/- ##
==========================================
+ Coverage 91.55% 91.57% +0.02%
==========================================
Files 150 150
Lines 48939 48945 +6
==========================================
+ Hits 44805 44821 +16
+ Misses 4134 4124 -10
Continue to review full report at Codecov.
|
Alright! We're down to 3 inconsistencies between Series[datetime64] and DatetimeIndex arithmetic. This, one where DatetimeIndex behavior is wrong, and one where a ruling is needed (but I strongly suspect Series behavior is wrong) #18850. |
Appveyor error is complaining that the assert_produces_warning(PerformanceWarning) are instead giving off DeprecationWarning. |
@jbrockmendel yes I see that. trying to figure out why that is happening. we are not catching a deprecation warning on windows somewhere. not sure if you have a windows box? |
pandas/core/ops.py
Outdated
@@ -407,8 +407,12 @@ def _validate_datetime(self, lvalues, rvalues, name): | |||
|
|||
# if tz's must be equal (same or None) | |||
if getattr(lvalues, 'tz', None) != getattr(rvalues, 'tz', None): | |||
raise ValueError("Incompatible tz's on datetime subtraction " | |||
"ops") | |||
if len(rvalues) == 1 and isna(rvalues[0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isna(rvalues).all()
@@ -505,11 +509,20 @@ def _convert_to_array(self, values, name=None, other=None): | |||
inferred_type = lib.infer_dtype(values) | |||
if (inferred_type in ('datetime64', 'datetime', 'date', 'time') or | |||
is_datetimetz(inferred_type)): | |||
|
|||
if ovalues is pd.NaT and name == '__sub__': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplify this like the above code
# GH#18808 | ||
ser = pd.Series([pd.NaT, pd.Timedelta('1s')]) | ||
res = ser - pd.NaT | ||
expected = pd.Series([pd.NaT, pd.NaT], dtype='timedelta64[ns]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convention is not to use pd.
@@ -407,8 +407,12 @@ def _validate_datetime(self, lvalues, rvalues, name): | |||
|
|||
# if tz's must be equal (same or None) | |||
if getattr(lvalues, 'tz', None) != getattr(rvalues, 'tz', None): | |||
raise ValueError("Incompatible tz's on datetime subtraction " | |||
"ops") | |||
if len(rvalues) == 1 and isna(rvalues).all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this matter if it is exactly len == 1, seems odd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue at hand is that pd.NaT
is special because it can play the role of a datetime
or a timedelta
depending on context. _TimeOp
wraps scalar inputs in an array that becomes rvalues
. But at a basic level here what we are interested in is "was the original right
argument pd.NaT
?" What this is doing (and the same you're asking me to do below) is to undo wrapping done by _TimeOp
. It is a much less clear way of checking "is the argument pd.NaT
?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you didn't answer the question, whether I have 1 NaT or all NaT is immaterial, I cann't use the rhs dtype as its ambiguous and must use the left.
pandas/core/ops.py
Outdated
isna(values).all()): | ||
values = np.empty(values.shape, dtype='timedelta64[ns]') | ||
if len(values) == 1 and other.dtype == 'timedelta64[ns]': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not simpler at all.
values = np.empty(values.shape, dtye=other.dtype)
values[:] = iNaT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, hence the sarcasm quotes and the simpler original implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, even then this is wrong because it isn't conditioning on the op being sub. Gonna revert back to how it was before this commit.
@@ -407,8 +407,12 @@ def _validate_datetime(self, lvalues, rvalues, name): | |||
|
|||
# if tz's must be equal (same or None) | |||
if getattr(lvalues, 'tz', None) != getattr(rvalues, 'tz', None): | |||
raise ValueError("Incompatible tz's on datetime subtraction " | |||
"ops") | |||
if len(rvalues) == 1 and isna(rvalues).all(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you didn't answer the question, whether I have 1 NaT or all NaT is immaterial, I cann't use the rhs dtype as its ambiguous and must use the left.
@@ -505,11 +509,20 @@ def _convert_to_array(self, values, name=None, other=None): | |||
inferred_type = lib.infer_dtype(values) | |||
if (inferred_type in ('datetime64', 'datetime', 'date', 'time') or | |||
is_datetimetz(inferred_type)): | |||
|
|||
if ovalues is pd.NaT and name == '__sub__': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue. you are repeating code. you only need to select the dtype if other.dtype == 'timedelta64[ns]'
but the construction is regardless
I'll get started on the new crop of comments. In the meantime, #18831 may be ready. |
closing in favor of #19024 |
git diff upstream/master -u -- "*.py" | flake8 --diff