Skip to content

Fix DTI comparison with None, datetime.date #19301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 2, 2018
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
b100f59
Fix DTI comparison with None, datetime.date
jbrockmendel Jan 18, 2018
59ac32c
add GH reference, whatsnew note
jbrockmendel Jan 18, 2018
79d9a02
Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…
jbrockmendel Jan 20, 2018
6720de9
Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…
jbrockmendel Jan 20, 2018
c2c1da5
Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…
jbrockmendel Jan 21, 2018
69f527d
Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…
jbrockmendel Jan 21, 2018
f2dbb7e
change return order per request
jbrockmendel Jan 21, 2018
0184b39
de-duplicate tests by giving proper names
jbrockmendel Jan 21, 2018
5dcc09c
Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…
jbrockmendel Jan 22, 2018
78d4eb1
add nan to invalid test
jbrockmendel Jan 22, 2018
39ad04d
rename test_dti_cmp_invalid
jbrockmendel Jan 23, 2018
d578aba
add tests for comparisons against datetimelikes
jbrockmendel Jan 24, 2018
6ba5d99
requested name change
jbrockmendel Jan 24, 2018
578a4a3
requested edits
jbrockmendel Jan 25, 2018
b57d15d
Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…
jbrockmendel Jan 26, 2018
0e0886a
Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…
jbrockmendel Jan 27, 2018
2ea1205
Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…
jbrockmendel Feb 1, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,7 @@ Datetimelike
- Bug in ``.astype()`` to non-ns timedelta units would hold the incorrect dtype (:issue:`19176`, :issue:`19223`, :issue:`12425`)
- Bug in subtracting :class:`Series` from ``NaT`` incorrectly returning ``NaT`` (:issue:`19158`)
- Bug in :func:`Series.truncate` which raises ``TypeError`` with a monotonic ``PeriodIndex`` (:issue:`17717`)
- Bug in comparison of :class:`DatetimeIndex` against ``None`` or ``datetime.date`` objects raising ``TypeError`` for ``==`` and ``!=`` comparisons instead of all-``False`` and all-``True``, respectively (:issue:`19301`)

Timezones
^^^^^^^^^
Expand Down Expand Up @@ -447,8 +448,6 @@ Numeric
- Bug in the :class:`DataFrame` constructor in which data containing very large positive or very large negative numbers was causing ``OverflowError`` (:issue:`18584`)
- Bug in :class:`Index` constructor with ``dtype='uint64'`` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)

-


Indexing
^^^^^^^^
Expand Down
18 changes: 10 additions & 8 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,16 @@ def wrapper(self, other):
else:
if isinstance(other, list):
other = DatetimeIndex(other)
elif not isinstance(other, (np.ndarray, Index, ABCSeries)):
other = _ensure_datetime64(other)
elif not isinstance(other, (np.datetime64, np.ndarray,
Index, ABCSeries)):
# Following Timestamp convention, __eq__ is all-False
# and __ne__ is all True, others raise TypeError.
if opname == '__eq__':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer to return in the if/elif, then just raise instead of an else

return np.zeros(shape=self.shape, dtype=bool)
elif opname == '__ne__':
return np.ones(shape=self.shape, dtype=bool)
raise TypeError('%s type object %s' %
(type(other), str(other)))

if is_datetimelike(other):
self._assert_tzawareness_compat(other)
Expand All @@ -146,12 +154,6 @@ def wrapper(self, other):
return compat.set_function_name(wrapper, opname, cls)


def _ensure_datetime64(other):
if isinstance(other, np.datetime64):
return other
raise TypeError('%s type object %s' % (type(other), str(other)))


_midnight = time(0, 0)


Expand Down
199 changes: 135 additions & 64 deletions pandas/tests/indexes/datetimes/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from pandas import (Timestamp, Timedelta, Series,
DatetimeIndex, TimedeltaIndex,
date_range)
from pandas._libs import tslib


@pytest.fixture(params=[None, 'UTC', 'Asia/Tokyo',
Expand Down Expand Up @@ -44,7 +45,74 @@ def addend(request):


class TestDatetimeIndexComparisons(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you lump the None with the NaT comparisons. Do we have the same testing of these comparisons for scalars?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the comparison tests are scattered. I had planned to consolidate them here in a follow-up to keep narrow focus here.

can you lump the None with the NaT comparisons

They would need to be separate tests since the behavior is different, but I can put the tests adjacent to each other in this class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centralizing the DTI comparison tests is now a part of #19317. After that I'll make a pass to make sure all the relevant cases are covered.

# TODO: De-duplicate with test_comparisons_nat below
@pytest.mark.parametrize('other', [datetime(2016, 1, 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move all of these comparisons (that you are moving anyway) to a new test_compare.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? The other test_arithmetic.py files we've refactored out recently mostly include a TestComparison class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to separate them, a followup is ok.

Timestamp('2016-01-01'),
np.datetime64('2016-01-01')])
def test_dti_cmp_datetimelike(self, other, tz):
dti = pd.date_range('2016-01-01', periods=2, tz=tz)
if tz is not None:
if isinstance(other, np.datetime64):
# no tzaware version available
return
elif isinstance(other, Timestamp):
other = other.tz_localize(dti.tzinfo)
else:
other = tslib._localize_pydatetime(other, dti.tzinfo)

result = dti == other
expected = np.array([True, False])
tm.assert_numpy_array_equal(result, expected)

result = dti > other
expected = np.array([False, True])
tm.assert_numpy_array_equal(result, expected)

result = dti >= other
expected = np.array([True, True])
tm.assert_numpy_array_equal(result, expected)

result = dti < other
expected = np.array([False, False])
tm.assert_numpy_array_equal(result, expected)

result = dti <= other
expected = np.array([True, False])
tm.assert_numpy_array_equal(result, expected)

def dti_cmp_non_datetime(self, tz):
# GH#19301 by convention datetime.date is not considered comparable
# to Timestamp or DatetimeIndex. This may change in the future.
dti = pd.date_range('2016-01-01', periods=2, tz=tz)

other = datetime(2016, 1, 1).date()
assert not (dti == other).any()
assert (dti != other).all()
with pytest.raises(TypeError):
dti < other
with pytest.raises(TypeError):
dti <= other
with pytest.raises(TypeError):
dti > other
with pytest.raises(TypeError):
dti >= other

@pytest.mark.parametrize('other', [None,
np.nan])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add pd.NaT here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, pd.NaT behaves differently from None or np.nan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? that would be troubling, why is that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __eq__ and __ne__ behave the same, but None and np.nan raise TypeError for the inequalities, whereas pd.NaT returns False.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. ok then split the tests to reflect this, testing all nulls at once for eq/ne make the test easy to grok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the test easy to grok.

I'll do this, but object to the notions that fixtures and easy-to-grok can go in the same file. If I can't import the test and run it in the interpreter, then I'm in a Strange Land.

def test_dti_cmp_non_datetime(self, tz, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dti_cmp_null_scalar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

# GH#19301
dti = pd.date_range('2016-01-01', periods=2, tz=tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this saying that

DatetimeIndex([None, '2016-01-01']) == [None, datetime.date(2016, 1, 1)])

is [False, False]? I thought that in #18188 we decided that DatetimeIndex compared with datetime.date would coerce the date to a datetime at midnight?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I think part of the confusion is over Timestamp comparison vs DatetimeIndex comparison vs Series[datetime64] comparison (e.g. the whatsnew note in #18188 talks about Series (but puts the tests in index tests...)). This PR and a bunch of other recent ones have been focused on making the Index/Series behavior more consistent.

Following this, #19288 can be de-kludged to make Series[datetime64] comparison dispatch to DatetimeIndex comparison, ensuring internal consistency. But you're right that this would mean a change in the behavior of Series[datetime64] comparisons.

For the moment I'm taking Timestamp behavior as canonical and making DatetimeIndex match that.

ts = pd.Timestamp('2016-01-01')
>>> ts == ts.to_pydatetime().date()
False
>>> ts < ts.to_pydatetime().date()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslib.pyx", line 1165, in pandas._libs.tslib._Timestamp.__richcmp__
TypeError: Cannot compare type 'Timestamp' with type 'date'

I recall a discussion of whether date should be treated as datetime-at-midnight for Timestamp comparison purposes, my thought being it should be treated as Period(..., freq='D').

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: ts = pd.Timestamp('2016-01-01')

In [2]: ts
Out[2]: Timestamp('2016-01-01 00:00:00')

In [3]: ts.date()
Out[3]: datetime.date(2016, 1, 1)

In [4]: ts.to_pydatetime()
Out[4]: datetime.datetime(2016, 1, 1, 0, 0)

In [5]: ts.to_pydatetime() == ts.date()
Out[5]: False

In [6]: ts.to_pydatetime().date() == ts.date()
Out[6]: True

I find [5] a bit odd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find [5] a bit odd.

The behavior is analogous to comparing Timestamp vs Period. eq and ne return False and True, respectively, and all others raise TypeError. Its odd if you interpret date as "datetime implicitly at midnight", but pretty intuitive if you interpret it as "less specific than a timestamp"


assert not (dti == other).any()
assert (dti != other).all()
with pytest.raises(TypeError):
dti < other
with pytest.raises(TypeError):
dti <= other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also test pd.NaT here (and I think we have a tests for np.nan that should test raising).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added nan to the params for this test. pd.NaT test is immediately below this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the datetime.date part of this needs to move to where we test comparisions with Timestamp, datetime.datetime and np.datetime64

with pytest.raises(TypeError):
dti > other
with pytest.raises(TypeError):
dti >= other

def test_dti_cmp_nat(self):
left = pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.NaT,
pd.Timestamp('2011-01-03')])
Expand Down Expand Up @@ -72,69 +140,7 @@ def test_dti_cmp_nat(self):
tm.assert_numpy_array_equal(lhs < pd.NaT, expected)
tm.assert_numpy_array_equal(pd.NaT > lhs, expected)

@pytest.mark.parametrize('op', [operator.eq, operator.ne,
operator.gt, operator.ge,
operator.lt, operator.le])
def test_comparison_tzawareness_compat(self, op):
# GH#18162
dr = pd.date_range('2016-01-01', periods=6)
dz = dr.tz_localize('US/Pacific')

with pytest.raises(TypeError):
op(dr, dz)
with pytest.raises(TypeError):
op(dr, list(dz))
with pytest.raises(TypeError):
op(dz, dr)
with pytest.raises(TypeError):
op(dz, list(dr))

# Check that there isn't a problem aware-aware and naive-naive do not
# raise
assert (dr == dr).all()
assert (dr == list(dr)).all()
assert (dz == dz).all()
assert (dz == list(dz)).all()

# Check comparisons against scalar Timestamps
ts = pd.Timestamp('2000-03-14 01:59')
ts_tz = pd.Timestamp('2000-03-14 01:59', tz='Europe/Amsterdam')

assert (dr > ts).all()
with pytest.raises(TypeError):
op(dr, ts_tz)

assert (dz > ts_tz).all()
with pytest.raises(TypeError):
op(dz, ts)

@pytest.mark.parametrize('op', [operator.eq, operator.ne,
operator.gt, operator.ge,
operator.lt, operator.le])
def test_nat_comparison_tzawareness(self, op):
# GH#19276
# tzaware DatetimeIndex should not raise when compared to NaT
dti = pd.DatetimeIndex(['2014-01-01', pd.NaT, '2014-03-01', pd.NaT,
'2014-05-01', '2014-07-01'])
expected = np.array([op == operator.ne] * len(dti))
result = op(dti, pd.NaT)
tm.assert_numpy_array_equal(result, expected)

result = op(dti.tz_localize('US/Pacific'), pd.NaT)
tm.assert_numpy_array_equal(result, expected)

def test_comparisons_coverage(self):
rng = date_range('1/1/2000', periods=10)

# raise TypeError for now
pytest.raises(TypeError, rng.__lt__, rng[3].value)

result = rng == list(rng)
exp = rng == rng
tm.assert_numpy_array_equal(result, exp)

def test_comparisons_nat(self):

def test_dti_cmp_nat_behaves_like_float_cmp_nan(self):
fidx1 = pd.Index([1.0, np.nan, 3.0, np.nan, 5.0, 7.0])
fidx2 = pd.Index([2.0, 3.0, np.nan, np.nan, 6.0, 7.0])

Expand Down Expand Up @@ -223,6 +229,71 @@ def test_comparisons_nat(self):
expected = np.array([True, True, False, True, True, True])
tm.assert_numpy_array_equal(result, expected)

@pytest.mark.parametrize('op', [operator.eq, operator.ne,
operator.gt, operator.ge,
operator.lt, operator.le])
def test_comparison_tzawareness_compat(self, op):
# GH#18162
dr = pd.date_range('2016-01-01', periods=6)
dz = dr.tz_localize('US/Pacific')

with pytest.raises(TypeError):
op(dr, dz)
with pytest.raises(TypeError):
op(dr, list(dz))
with pytest.raises(TypeError):
op(dz, dr)
with pytest.raises(TypeError):
op(dz, list(dr))

# Check that there isn't a problem aware-aware and naive-naive do not
# raise
assert (dr == dr).all()
assert (dr == list(dr)).all()
assert (dz == dz).all()
assert (dz == list(dz)).all()

# Check comparisons against scalar Timestamps
ts = pd.Timestamp('2000-03-14 01:59')
ts_tz = pd.Timestamp('2000-03-14 01:59', tz='Europe/Amsterdam')

assert (dr > ts).all()
with pytest.raises(TypeError):
op(dr, ts_tz)

assert (dz > ts_tz).all()
with pytest.raises(TypeError):
op(dz, ts)

@pytest.mark.parametrize('op', [operator.eq, operator.ne,
operator.gt, operator.ge,
operator.lt, operator.le])
def test_nat_comparison_tzawareness(self, op):
# GH#19276
# tzaware DatetimeIndex should not raise when compared to NaT
dti = pd.DatetimeIndex(['2014-01-01', pd.NaT, '2014-03-01', pd.NaT,
'2014-05-01', '2014-07-01'])
expected = np.array([op == operator.ne] * len(dti))
result = op(dti, pd.NaT)
tm.assert_numpy_array_equal(result, expected)

result = op(dti.tz_localize('US/Pacific'), pd.NaT)
tm.assert_numpy_array_equal(result, expected)

def test_dti_cmp_int_raises(self):
rng = date_range('1/1/2000', periods=10)

# raise TypeError for now
with pytest.raises(TypeError):
rng < rng[3].value

def test_dti_cmp_list(self):
rng = date_range('1/1/2000', periods=10)

result = rng == list(rng)
expected = rng == rng
tm.assert_numpy_array_equal(result, expected)


class TestDatetimeIndexArithmetic(object):

Expand Down