WIP: Dispatch Series comparison methods to Index implementations #19288

jbrockmendel · 2018-01-17T18:32:06Z

Same thing we've been doing for arithmetic operations, now moving on to comparison methods. Based on some trial-and-error, it seems like there are some inconsistencies to be worked out for the categorical dtype case. This PR:

Dispatches timedelta and datetime Series comparisons to their respective Index subclass implementations
- Patches DatetimeIndex._assert_tz_awareness_compat, which has already been broken out into Fix tzawareness_compat for DatetimeIndex comparisons with NaT #19276.
- Kludges datetime.date and None case that Series treats differently from DatetimeIndex; decisions should be made about those so these kludges can be removed.
Removes a # temp workaround until fixing GH 13637 for comparison with PeriodIndex that appears to no longer be needed.
Fixes names of the results for comparisons between Series and Index classes. (probably needs tests)
Notes some small inconsistencies in how _constructor is called, also discussed in Series comparison ops __finalize__ inconsistent #19271.

…ries_cmp

jreback · 2018-01-18T00:45:21Z

pandas/core/ops.py

+            # kludge; DatetimeIndex refuses to compare against None or
+            # datetime.date; until the "right" behavior is resolved, we cast
+            # these types here to types that DatetimeIndex understand.
+            if type(other) is datetime.date:


use isinstance.

what do you mean refuses to compare?

use isinstance.

Will change, but note the isinstance version is pretty verbose: if isinstance(other, datetime.date) and not isinstance(other, datetime.datetime).

what do you mean refuses to compare?

DatetimeIndex comparison with a) datetime.date or b) None raises TypeError; Series casts to datetime and NaT, respectively.

dti = pd.date_range('2016-01-01', periods=2) ser = pd.Series(dti) >>> dti == datetime.date(2016, 1, 1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pandas/core/indexes/datetimes.py", line 119, in wrapper other = _ensure_datetime64(other) File "pandas/core/indexes/datetimes.py", line 145, in _ensure_datetime64 raise TypeError('%s type object %s' % (type(other), str(other))) TypeError: <type 'datetime.date'> type object 2016-01-01 >>> ser == datetime.date(2016, 1, 1) 0 True 1 False dtype: bool >>> dti != None Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pandas/core/indexes/datetimes.py", line 119, in wrapper other = _ensure_datetime64(other) File "pandas/core/indexes/datetimes.py", line 145, in _ensure_datetime64 raise TypeError('%s type object %s' % (type(other), str(other))) TypeError: <type 'NoneType'> type object None >>> ser != None 0 True 1 True dtype: bool

So really one of these two behaviors should be chosen as canonical and the other changed (and tested), at which point these shims will be unnecessary.

these comparison should work for both DTI and None. These work for scalars as well already.

I am totally on board with the "DTI comparison should behave like a vectorized Timestamp comparison" approach, but it isn't 100% straightforward:

For datetime.date comparison, Timestamp.__eq__ is always False and Timestamp.__ne__ is always True; lt, le, gt, ge all raise TypeError.

dti = pd.date_range('2016-01-01', periods=2) ts = dti[0] dt = ts.to_pydatetime() d = dt.date() >>> ts == d False >>> ts < d File "pandas/_libs/tslibs/timestamps.pyx", line 121, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__ raise TypeError('Cannot compare type %r with type %r' % TypeError: Cannot compare type 'Timestamp' with type 'date'

Same behavior for None, i.e. it is not treated being equivalent to pd.NaT.

hmm, I would prefer a patch independetly of this one to fix these 2 cases, these already work with DTI

#19301 is intended to do that. Did you have something else in mind?

yes saw that after

jreback · 2018-01-18T00:46:04Z

pandas/core/ops.py

+            # these types here to types that DatetimeIndex understand.
+            if type(other) is datetime.date:
+                other = datetime.datetime(other.year, other.month, other.day)
+            elif other is None:


why is this None check needed? do we not do this in the op comparison directly?

and I don't like this code path at all, meaning you are special casing things, which IMHO should be handle inside the DTI comparisons.

Yes; the next step is to put this on the back-burner and fix the DTI comparisons.

…ries_cmp

codecov · 2018-01-18T07:37:43Z

Codecov Report

Merging #19288 into master will increase coverage by <.01%.
The diff coverage is 94.73%.

@@            Coverage Diff             @@
##           master   #19288      +/-   ##
==========================================
+ Coverage   91.56%   91.56%   +<.01%     
==========================================
  Files         148      148              
  Lines       48888    48881       -7     
==========================================
- Hits        44765    44760       -5     
+ Misses       4123     4121       -2

Flag	Coverage Δ
#multiple	`89.94% <94.73%> (ø)`	⬆️
#single	`41.69% <52.63%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/datetimes.py	`95.33% <100%> (+0.09%)`	⬆️
pandas/core/ops.py	`91.73% <94.28%> (-0.44%)`	⬇️
pandas/io/parquet.py	`69.9% <0%> (-1.66%)`	⬇️
pandas/io/s3.py	`85% <0%> (-1.37%)`	⬇️
pandas/io/parsers.py	`95.48% <0%> (-0.02%)`	⬇️
pandas/core/generic.py	`95.9% <0%> (-0.01%)`	⬇️
pandas/io/common.py	`69.06% <0%> (ø)`	⬆️
pandas/core/internals.py	`94.5% <0%> (ø)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7a2e94...2a77d63. Read the comment docs.

jbrockmendel · 2018-01-18T10:21:35Z

I can’t replicate appveyor error locally. Something windows specific?

jbrockmendel · 2018-01-23T18:51:53Z

Closing until #19301

jbrockmendel added 4 commits January 16, 2018 22:41

dispatch some Series comparison methods to Index subclasses

53b4067

Move ndarray and Index cases to the end

e93e879

revert unrelated edit

f8c3605

Merge branch 'master' of https://github.com/pandas-dev/pandas into se…

7494c0f

…ries_cmp

jreback requested changes Jan 18, 2018

View reviewed changes

jbrockmendel added 2 commits January 17, 2018 22:50

flake8 unused imports

947c81a

Merge branch 'master' of https://github.com/pandas-dev/pandas into se…

2a77d63

…ries_cmp

jbrockmendel mentioned this pull request Jan 18, 2018

Fix DTI comparison with None, datetime.date #19301

Merged

4 tasks

jbrockmendel closed this Jan 23, 2018

jbrockmendel deleted the series_cmp branch June 22, 2018 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Dispatch Series comparison methods to Index implementations #19288

WIP: Dispatch Series comparison methods to Index implementations #19288

jbrockmendel commented Jan 17, 2018

jreback Jan 18, 2018

jbrockmendel Jan 18, 2018

jreback Jan 18, 2018

jbrockmendel Jan 18, 2018

jreback Jan 18, 2018

jbrockmendel Jan 19, 2018

jreback Jan 19, 2018

jreback Jan 18, 2018

jbrockmendel Jan 18, 2018

jreback Jan 18, 2018

jbrockmendel Jan 18, 2018

codecov bot commented Jan 18, 2018

jbrockmendel commented Jan 18, 2018

jbrockmendel commented Jan 23, 2018

WIP: Dispatch Series comparison methods to Index implementations #19288

WIP: Dispatch Series comparison methods to Index implementations #19288

Conversation

jbrockmendel commented Jan 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 18, 2018

Codecov Report

jbrockmendel commented Jan 18, 2018

jbrockmendel commented Jan 23, 2018