Skip to content

Possible regression in comparison operation for interval dtypes #28981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dsaxton opened this issue Oct 14, 2019 · 4 comments · Fixed by #34443
Closed

Possible regression in comparison operation for interval dtypes #28981

dsaxton opened this issue Oct 14, 2019 · 4 comments · Fixed by #34443
Labels
good first issue Interval Interval data type Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@dsaxton
Copy link
Member

dsaxton commented Oct 14, 2019

It seems that this comparison is now failing on master when it was working in 0.25.1. Need to look a bit more, but I don't think it's specific to this operation.

import pandas as pd

s = pd.Series([pd.Interval(0, 1), pd.Interval(1, 2)], dtype="interval")  
s == "a"

0.25.1

0    False
1    False
dtype: bool

master

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-3a654234a428> in <module>
----> 1 s == "a"

~/pandas/pandas/core/ops/__init__.py in wrapper(self, other)
    527         rvalues = extract_array(other, extract_numpy=True)
    528 
--> 529         res_values = comparison_op(lvalues, rvalues, op)
    530 
    531         return _construct_result(self, res_values, index=self.index, name=res_name)

~/pandas/pandas/core/ops/array_ops.py in comparison_op(left, right, op)
    253 
    254     if should_extension_dispatch(lvalues, rvalues):
--> 255         res_values = dispatch_to_extension_op(op, lvalues, rvalues)
    256 
    257     elif is_scalar(rvalues) and isna(rvalues):

~/pandas/pandas/core/ops/dispatch.py in dispatch_to_extension_op(op, left, right, keep_null_freq)
    124     #  a Series or Index.
    125 
--> 126     if left.dtype.kind in "mM" and isinstance(left, np.ndarray):
    127         # We need to cast datetime64 and timedelta64 ndarrays to
    128         #  DatetimeArray/TimedeltaArray.  But we avoid wrapping others in

TypeError: 'in <string>' requires string as left operand, not NoneType

@jbrockmendel Is the fix for this as simple as (say) setting the kind to "O" here https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/dtypes.py#L975? It looks like we're assuming the dtype has a kind attribute when it doesn't.

@dsaxton dsaxton changed the title Possible regression in comparison between string and interval dtypes Possible regression in comparison operation for interval dtypes Oct 15, 2019
@jbrockmendel jbrockmendel added Interval Interval data type Numeric Operations Arithmetic, Comparison, and Logical operations labels Oct 16, 2019
@jbrockmendel
Copy link
Member

I think ive seen this before and found that changing in "mM" to in ["m", "M"] fixes it. Can you confirm whether this fixes it here?

@dsaxton
Copy link
Member Author

dsaxton commented Oct 22, 2019

I think ive seen this before and found that changing in "mM" to in ["m", "M"] fixes it. Can you confirm whether this fixes it here?

Yep, that seems to fix it. Incidentally I also noticed this, which feels like it shouldn't be an error:

[ins] In [14]: pd.Interval(1, 2) < pd.Interval(3, 4)                                                                                                                              
Out[14]: True

[ins] In [15]: s = pd.Series([pd.Interval(1, 2)], dtype="interval")                                                                                                               

[ins] In [16]: s < pd.Interval(3, 4)                                                                                                                                              
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-91c971a412c6> in <module>
----> 1 s < pd.Interval(3, 4)

~/pandas/pandas/core/ops/__init__.py in wrapper(self, other)
    527         rvalues = extract_array(other, extract_numpy=True)
    528 
--> 529         res_values = comparison_op(lvalues, rvalues, op)
    530 
    531         return _construct_result(self, res_values, index=self.index, name=res_name)

~/pandas/pandas/core/ops/array_ops.py in comparison_op(left, right, op)
    253 
    254     if should_extension_dispatch(lvalues, rvalues):
--> 255         res_values = dispatch_to_extension_op(op, lvalues, rvalues)
    256 
    257     elif is_scalar(rvalues) and isna(rvalues):

~/pandas/pandas/core/ops/dispatch.py in dispatch_to_extension_op(op, left, right, keep_null_freq)
    134 
    135     try:
--> 136         res_values = op(left, right)
    137     except NullFrequencyError:
    138         # DatetimeIndex and TimedeltaIndex with freq == None raise ValueError

TypeError: '<' not supported between instances of 'IntervalArray' and 'pandas._libs.interval.Interval'

It works if you don't explicitly set dtype="interval":

[ins] In [17]: s = pd.Series([pd.Interval(1, 2)])                                                                                                                                 

[ins] In [18]: s < pd.Interval(3, 4)                                                                                                                                              
Out[18]: 
0    True
dtype: bool

@jbrockmendel
Copy link
Member

It looks like IntervalArray comparison ops haven't been defined at all. That would be really nice to see fixed. PR welcome.

@mroeschke
Copy link
Member

This looks fixed in master. Could use a test

In [83]: pd.__version__
Out[83]: '1.1.0.dev0+1390.gf3fdab389'

In [84]: import pandas as pd
    ...:
    ...: s = pd.Series([pd.Interval(0, 1), pd.Interval(1, 2)], dtype="interval")
    ...: s == "a"
Out[84]:
0    False
1    False
dtype: bool

@mroeschke mroeschke added good first issue Needs Info Clarification about behavior needed to assess issue Needs Tests Unit test(s) needed to prevent regressions and removed Interval Interval data type Numeric Operations Arithmetic, Comparison, and Logical operations Needs Info Clarification about behavior needed to assess issue labels Apr 27, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue May 28, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue May 29, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue Jun 3, 2020
@jreback jreback added this to the 1.1 milestone Jun 3, 2020
@jreback jreback added the Interval Interval data type label Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Interval Interval data type Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants