Skip to content

Certain comparison operations misbehaving for period dtype #28980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dsaxton opened this issue Oct 14, 2019 · 3 comments · Fixed by #34417
Closed

Certain comparison operations misbehaving for period dtype #28980

dsaxton opened this issue Oct 14, 2019 · 3 comments · Fixed by #34417
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Period Period data type
Milestone

Comments

@dsaxton
Copy link
Member

dsaxton commented Oct 14, 2019

This is closely related to #28930.

import pandas as pd

s = pd.Series([pd.Period("2019"), pd.Period("2020")], dtype="period[A-DEC]") 
s == "a"                                                                                                                                                     

yields

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-3a654234a428> in <module>
----> 1 s == "a"

~/pandas/pandas/core/ops/__init__.py in wrapper(self, other)
    527         rvalues = extract_array(other, extract_numpy=True)
    528 
--> 529         res_values = comparison_op(lvalues, rvalues, op)
    530 
    531         return _construct_result(self, res_values, index=self.index, name=res_name)

~/pandas/pandas/core/ops/array_ops.py in comparison_op(left, right, op)
    253 
    254     if should_extension_dispatch(lvalues, rvalues):
--> 255         res_values = dispatch_to_extension_op(op, lvalues, rvalues)
    256 
    257     elif is_scalar(rvalues) and isna(rvalues):

~/pandas/pandas/core/ops/dispatch.py in dispatch_to_extension_op(op, left, right, keep_null_freq)
    134 
    135     try:
--> 136         res_values = op(left, right)
    137     except NullFrequencyError:
    138         # DatetimeIndex and TimedeltaIndex with freq == None raise ValueError

~/pandas/pandas/core/arrays/period.py in wrapper(self, other)
     98             result.fill(nat_result)
     99         else:
--> 100             other = Period(other, freq=self.freq)
    101             result = ordinal_op(other.ordinal)
    102 

~/pandas/pandas/_libs/tslibs/period.pyx in pandas._libs.tslibs.period.Period.__new__()
   2459                 value = str(value)
   2460             value = value.upper()
-> 2461             dt, _, reso = parse_time_string(value, freq)
   2462             if dt is NaT:
   2463                 ordinal = NPY_NAT

~/pandas/pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_time_string()
    265             yearfirst = get_option("display.date_yearfirst")
    266 
--> 267     res = parse_datetime_string_with_reso(arg, freq=freq,
    268                                           dayfirst=dayfirst,
    269                                           yearfirst=yearfirst)

~/pandas/pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string_with_reso()
    291 
    292     if not _does_string_look_like_datetime(date_string):
--> 293         raise ValueError('Given date string not likely a datetime.')
    294 
    295     parsed, reso = _parse_delimited_date(date_string, dayfirst)

ValueError: Given date string not likely a datetime.

but the output should be Series([False, False]). The other comparison operators <, <=, >, and >= yield the same error, which also seems incorrect (the error message should probably be different, and the error should be a TypeError rather than a ValueError). This is happening on 0.25.1 and master.

@jbrockmendel
Copy link
Member

xref #24576 suggesting that _unbox_scalar can be used to unify some of the logic in these comparison ops

@dsaxton
Copy link
Member Author

dsaxton commented Oct 16, 2019

Would it make sense to have the parsing be optional since some comparisons may not require it? The comparison itself could then handle any failed casting.

@jbrockmendel jbrockmendel added Numeric Operations Arithmetic, Comparison, and Logical operations Period Period data type labels Oct 16, 2019
@mroeschke
Copy link
Member

Looks to work on master now. Could use a test

In [10]: import pandas as pd
    ...:
    ...: s = pd.Series([pd.Period("2019"), pd.Period("2020")], dtype="period[A-DEC]")
    ...: s == "a"
Out[10]:
0    False
1    False
dtype: bool

In [11]: pd.__version__
Out[11]: '1.1.0.dev0+1536.ga7fb88fd5'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Numeric Operations Arithmetic, Comparison, and Logical operations Period Period data type labels May 11, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue May 27, 2020
@jreback jreback added this to the 1.1 milestone May 28, 2020
@jreback jreback added the Period Period data type label May 28, 2020
OlivierLuG pushed a commit to OlivierLuG/pandas that referenced this issue May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Period Period data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants