Make Series[datetime64] - pd.NaT behave like DatetimeIndex - pd.NaT #18960

jbrockmendel · 2017-12-27T19:27:02Z

closes Series.__sub__(NaT) vs DatetimeIndex.__sub__(NaT) #18808
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…bnat

jreback · 2017-12-27T19:50:57Z

pandas/core/ops.py

@@ -407,8 +407,12 @@ def _validate_datetime(self, lvalues, rvalues, name):

            # if tz's must be equal (same or None)
            if getattr(lvalues, 'tz', None) != getattr(rvalues, 'tz', None):
-                raise ValueError("Incompatible tz's on datetime subtraction "
-                                 "ops")
+                if len(rvalues) == 1 and np.isnat(rvalues[0]):


we never use np.isnat, always use isna.

why is this restricted to len(..) == 1? if they are all nat should be enough

why is this restricted to len(..) == 1?

Because the bug being addressed is specific to scalar NaT subtraction. Non-scalar cases are handled elsewhere.

we never use np.isnat, always use isna.

I don't have a strong opinion. np.isnat is more strict, which I figure is preferable since all else equal I'd like to avoid catching extra cases by accident.

well I do. don't use it.

You got it buddy. Just changed and pushed.

jreback · 2017-12-27T19:51:33Z

pandas/core/ops.py

@@ -505,11 +509,20 @@ def _convert_to_array(self, values, name=None, other=None):
        inferred_type = lib.infer_dtype(values)
        if (inferred_type in ('datetime64', 'datetime', 'date', 'time') or
                is_datetimetz(inferred_type)):
+
+            if ovalues is pd.NaT and name == '__sub__':


what's wrong this this code that is existing? (obviously the dtype is incorrect if other is dt64 but otherwise this is fine)

# if we have a other of timedelta, but use pd.NaT here we # we are in the wrong path if (supplied_dtype is None and other is not None and (other.dtype in ('timedelta64[ns]', 'datetime64[ns]')) and isna(values).all()): values = np.empty(values.shape, dtype='timedelta64[ns]') values[:] = iNaT

For starters the other.dtype condition excludes datetime64s with timezones. More generally, this PR is specifically about scalar subtraction of pd.NaT and I would rather give that its own block than try to shoe-horn it into an existing block.

simplify this like the above code

OK. But in my head I'm going to put sarcasm quotes around the word "simplify".

same issue. you are repeating code. you only need to select the dtype if other.dtype == 'timedelta64[ns]'
but the construction is regardless

What you're suggesting would catch np.array([np.datetime64('NaT')]) which is unambiguously datetimelike, or consider np.array([pd.NaT, datetime64('NaT')]). _TimeOp is a mess in part because it is not clear about what cases it is trying to handle where. This PR is trying to fix a bug tied to one very specific case. The fix should not catch any cases other than that specific intended case.

codecov · 2017-12-27T20:15:35Z

Codecov Report

Merging #18960 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18960      +/-   ##
==========================================
+ Coverage   91.55%   91.57%   +0.02%     
==========================================
  Files         150      150              
  Lines       48939    48945       +6     
==========================================
+ Hits        44805    44821      +16     
+ Misses       4134     4124      -10

Flag	Coverage Δ
#multiple	`89.93% <100%> (+0.02%)`	⬆️
#single	`41.75% <25%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/ops.py	`90.33% <100%> (+0.09%)`	⬆️
pandas/util/testing.py	`84.74% <0%> (-0.22%)`	⬇️
pandas/plotting/_converter.py	`66.95% <0%> (+1.73%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e3c797...9ac970d. Read the comment docs.

…bnat

jbrockmendel · 2017-12-29T19:11:33Z

Alright, the CI is turnaround is going to keep me off your back for a while. In the interim, this and #18959 are ready to within the margin of error. If the stars align #17746 may be there too.

…bnat

jbrockmendel · 2017-12-30T17:00:48Z

Alright! We're down to 3 inconsistencies between Series[datetime64] and DatetimeIndex arithmetic. This, one where DatetimeIndex behavior is wrong, and one where a ruling is needed (but I strongly suspect Series behavior is wrong) #18850.

jbrockmendel · 2017-12-30T17:18:47Z

Appveyor error is complaining that the assert_produces_warning(PerformanceWarning) are instead giving off DeprecationWarning.

jreback · 2017-12-30T17:44:05Z

@jbrockmendel yes I see that. trying to figure out why that is happening. we are not catching a deprecation warning on windows somewhere. not sure if you have a windows box?

…bnat

jreback · 2017-12-31T00:49:17Z

pandas/core/ops.py

@@ -407,8 +407,12 @@ def _validate_datetime(self, lvalues, rvalues, name):

            # if tz's must be equal (same or None)
            if getattr(lvalues, 'tz', None) != getattr(rvalues, 'tz', None):
-                raise ValueError("Incompatible tz's on datetime subtraction "
-                                 "ops")
+                if len(rvalues) == 1 and isna(rvalues[0]):


isna(rvalues).all()

jreback · 2017-12-31T00:50:05Z

pandas/core/ops.py

@@ -505,11 +509,20 @@ def _convert_to_array(self, values, name=None, other=None):
        inferred_type = lib.infer_dtype(values)
        if (inferred_type in ('datetime64', 'datetime', 'date', 'time') or
                is_datetimetz(inferred_type)):
+
+            if ovalues is pd.NaT and name == '__sub__':


simplify this like the above code

jreback · 2017-12-31T00:50:35Z

pandas/tests/series/test_operators.py

+        # GH#18808
+        ser = pd.Series([pd.NaT, pd.Timedelta('1s')])
+        res = ser - pd.NaT
+        expected = pd.Series([pd.NaT, pd.NaT], dtype='timedelta64[ns]')


convention is not to use pd.

jreback · 2017-12-31T01:44:59Z

pandas/core/ops.py

@@ -407,8 +407,12 @@ def _validate_datetime(self, lvalues, rvalues, name):

            # if tz's must be equal (same or None)
            if getattr(lvalues, 'tz', None) != getattr(rvalues, 'tz', None):
-                raise ValueError("Incompatible tz's on datetime subtraction "
-                                 "ops")
+                if len(rvalues) == 1 and isna(rvalues).all():


why does this matter if it is exactly len == 1, seems odd

The issue at hand is that pd.NaT is special because it can play the role of a datetime or a timedelta depending on context. _TimeOp wraps scalar inputs in an array that becomes rvalues. But at a basic level here what we are interested in is "was the original right argument pd.NaT?" What this is doing (and the same you're asking me to do below) is to undo wrapping done by _TimeOp. It is a much less clear way of checking "is the argument pd.NaT?"

you didn't answer the question, whether I have 1 NaT or all NaT is immaterial, I cann't use the rhs dtype as its ambiguous and must use the left.

jreback · 2017-12-31T01:46:25Z

pandas/core/ops.py

                    isna(values).all()):
-                values = np.empty(values.shape, dtype='timedelta64[ns]')
+                if len(values) == 1 and other.dtype == 'timedelta64[ns]':


this is not simpler at all.

values = np.empty(values.shape, dtye=other.dtype) values[:] = iNaT

I agree, hence the sarcasm quotes and the simpler original implementation.

Yah, even then this is wrong because it isn't conditioning on the op being sub. Gonna revert back to how it was before this commit.

jreback · 2017-12-31T14:55:39Z

pandas/core/ops.py

@@ -407,8 +407,12 @@ def _validate_datetime(self, lvalues, rvalues, name):

            # if tz's must be equal (same or None)
            if getattr(lvalues, 'tz', None) != getattr(rvalues, 'tz', None):
-                raise ValueError("Incompatible tz's on datetime subtraction "
-                                 "ops")
+                if len(rvalues) == 1 and isna(rvalues).all():


you didn't answer the question, whether I have 1 NaT or all NaT is immaterial, I cann't use the rhs dtype as its ambiguous and must use the left.

jreback · 2017-12-31T14:56:23Z

pandas/core/ops.py

@@ -505,11 +509,20 @@ def _convert_to_array(self, values, name=None, other=None):
        inferred_type = lib.infer_dtype(values)
        if (inferred_type in ('datetime64', 'datetime', 'date', 'time') or
                is_datetimetz(inferred_type)):
+
+            if ovalues is pd.NaT and name == '__sub__':


same issue. you are repeating code. you only need to select the dtype if other.dtype == 'timedelta64[ns]'
but the construction is regardless

jbrockmendel · 2017-12-31T17:23:45Z

I'll get started on the new crop of comments. In the meantime, #18831 may be ready.

jreback · 2018-01-01T17:55:23Z

closing in favor of #19024

jbrockmendel added 2 commits December 19, 2017 10:16

handle tzs, remove tests for old behavior

d4b799a

Merge branch 'master' of https://github.com/pandas-dev/pandas into su…

c07f49a

…bnat

jreback requested changes Dec 27, 2017

View reviewed changes

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Datetime Datetime data dtype labels Dec 27, 2017

use isna instead of np.isnat

49ed387

Merge branch 'master' of https://github.com/pandas-dev/pandas into su…

782ea74

…bnat

Merge branch 'master' of https://github.com/pandas-dev/pandas into su…

5540a4b

…bnat

Merge branch 'master' of https://github.com/pandas-dev/pandas into su…

e99b5e1

…bnat

jreback requested changes Dec 31, 2017

View reviewed changes

jbrockmendel added 2 commits December 30, 2017 17:39

adhere to convention

f0da8ec

make code 'simpler'

f275a6c

jreback requested changes Dec 31, 2017

View reviewed changes

revert simplification

9ac970d

jreback requested changes Dec 31, 2017

View reviewed changes

This was referenced Dec 31, 2017

Series[datetime64] - PeriodIndex vs DatetimeIndex - PeriodIndex #18850

Closed

dispatch Series[datetime64] ops to DatetimeIndex #19024

Merged

jreback closed this Jan 1, 2018

jreback added this to the No action milestone Jan 4, 2018

jbrockmendel deleted the subnat branch January 23, 2018 04:40

Uh oh!

Make Series[datetime64] - pd.NaT behave like DatetimeIndex - pd.NaT #18960

Make Series[datetime64] - pd.NaT behave like DatetimeIndex - pd.NaT #18960

Uh oh!

Conversation

jbrockmendel commented Dec 27, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jbrockmendel commented Dec 29, 2017

Uh oh!

jbrockmendel commented Dec 30, 2017

Uh oh!

jbrockmendel commented Dec 30, 2017

Uh oh!

jreback commented Dec 30, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Dec 31, 2017

Uh oh!

jreback commented Jan 1, 2018

Uh oh!

Uh oh!

codecov bot commented Dec 27, 2017 •

edited

Loading