Skip to content

WIP: Dispatch Series comparison methods to Index implementations #19288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

jbrockmendel
Copy link
Member

Same thing we've been doing for arithmetic operations, now moving on to comparison methods. Based on some trial-and-error, it seems like there are some inconsistencies to be worked out for the categorical dtype case. This PR:

  • Dispatches timedelta and datetime Series comparisons to their respective Index subclass implementations
  • Removes a # temp workaround until fixing GH 13637 for comparison with PeriodIndex that appears to no longer be needed.
  • Fixes names of the results for comparisons between Series and Index classes. (probably needs tests)
  • Notes some small inconsistencies in how _constructor is called, also discussed in Series comparison ops __finalize__ inconsistent #19271.

# kludge; DatetimeIndex refuses to compare against None or
# datetime.date; until the "right" behavior is resolved, we cast
# these types here to types that DatetimeIndex understand.
if type(other) is datetime.date:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use isinstance.

what do you mean refuses to compare?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use isinstance.

Will change, but note the isinstance version is pretty verbose: if isinstance(other, datetime.date) and not isinstance(other, datetime.datetime).

what do you mean refuses to compare?

DatetimeIndex comparison with a) datetime.date or b) None raises TypeError; Series casts to datetime and NaT, respectively.

dti = pd.date_range('2016-01-01', periods=2)
ser = pd.Series(dti)

>>> dti ==  datetime.date(2016, 1, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/indexes/datetimes.py", line 119, in wrapper
    other = _ensure_datetime64(other)
  File "pandas/core/indexes/datetimes.py", line 145, in _ensure_datetime64
    raise TypeError('%s type object %s' % (type(other), str(other)))
TypeError: <type 'datetime.date'> type object 2016-01-01

>>> ser == datetime.date(2016, 1, 1)
0     True
1    False
dtype: bool

>>> dti != None
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/indexes/datetimes.py", line 119, in wrapper
    other = _ensure_datetime64(other)
  File "pandas/core/indexes/datetimes.py", line 145, in _ensure_datetime64
    raise TypeError('%s type object %s' % (type(other), str(other)))
TypeError: <type 'NoneType'> type object None

>>> ser != None
0    True
1    True
dtype: bool

So really one of these two behaviors should be chosen as canonical and the other changed (and tested), at which point these shims will be unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comparison should work for both DTI and None. These work for scalars as well already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am totally on board with the "DTI comparison should behave like a vectorized Timestamp comparison" approach, but it isn't 100% straightforward:

For datetime.date comparison, Timestamp.__eq__ is always False and Timestamp.__ne__ is always True; lt, le, gt, ge all raise TypeError.

dti = pd.date_range('2016-01-01', periods=2)
ts = dti[0]
dt = ts.to_pydatetime()
d = dt.date()

>>> ts == d
False
>>> ts < d
  File "pandas/_libs/tslibs/timestamps.pyx", line 121, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__
    raise TypeError('Cannot compare type %r with type %r' %
TypeError: Cannot compare type 'Timestamp' with type 'date'

Same behavior for None, i.e. it is not treated being equivalent to pd.NaT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I would prefer a patch independetly of this one to fix these 2 cases, these already work with DTI

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#19301 is intended to do that. Did you have something else in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes saw that after

# these types here to types that DatetimeIndex understand.
if type(other) is datetime.date:
other = datetime.datetime(other.year, other.month, other.day)
elif other is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this None check needed? do we not do this in the op comparison directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I don't like this code path at all, meaning you are special casing things, which IMHO should be handle inside the DTI comparisons.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; the next step is to put this on the back-burner and fix the DTI comparisons.

@codecov
Copy link

codecov bot commented Jan 18, 2018

Codecov Report

Merging #19288 into master will increase coverage by <.01%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19288      +/-   ##
==========================================
+ Coverage   91.56%   91.56%   +<.01%     
==========================================
  Files         148      148              
  Lines       48888    48881       -7     
==========================================
- Hits        44765    44760       -5     
+ Misses       4123     4121       -2
Flag Coverage Δ
#multiple 89.94% <94.73%> (ø) ⬆️
#single 41.69% <52.63%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.33% <100%> (+0.09%) ⬆️
pandas/core/ops.py 91.73% <94.28%> (-0.44%) ⬇️
pandas/io/parquet.py 69.9% <0%> (-1.66%) ⬇️
pandas/io/s3.py 85% <0%> (-1.37%) ⬇️
pandas/io/parsers.py 95.48% <0%> (-0.02%) ⬇️
pandas/core/generic.py 95.9% <0%> (-0.01%) ⬇️
pandas/io/common.py 69.06% <0%> (ø) ⬆️
pandas/core/internals.py 94.5% <0%> (ø) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7a2e94...2a77d63. Read the comment docs.

@jbrockmendel
Copy link
Member Author

I can’t replicate appveyor error locally. Something windows specific?

@jbrockmendel
Copy link
Member Author

Closing until #19301

@jbrockmendel jbrockmendel deleted the series_cmp branch June 22, 2018 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants