Series.searchsorted with different timezones #30086

kenahoo · 2019-12-05T18:40:29Z

Code Sample, a copy-pastable example if possible

DatetimeIndex.searchsorted works fine with mixed timezones:

>>> series_dt = pd.date_range('2015-02-19', periods=2, freq='1d', tz='UTC')
>>> dt = pd.to_datetime('2015-02-19T12:34:56').tz_localize('America/New_York')
>>> series_dt.searchsorted(dt)
1

However, Series.searchsorted does not:

>>> from io import StringIO
>>> df = pd.read_csv(StringIO("datetime\n2015-02-19T00:00:00Z\n2015-02-20T00:00:00Z"), parse_dates=['datetime'])
>>> df.datetime.searchsorted(dt)
Traceback (most recent call last):
  File "/Users/kwilliams/Library/Application Support/IntelliJIdea2019.2/python/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "venv/lib/python3.7/site-packages/pandas/core/series.py", line 2694, in searchsorted
    return algorithms.searchsorted(self._values, value, side=side, sorter=sorter)
  File "venv/lib/python3.7/site-packages/pandas/core/algorithms.py", line 1887, in searchsorted
    result = arr.searchsorted(value, side=side, sorter=sorter)
  File "venv/lib/python3.7/site-packages/pandas/core/arrays/datetimelike.py", line 666, in searchsorted
    self._check_compatible_with(value)
  File "venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 591, in _check_compatible_with
    own=self.tz, other=other.tz
ValueError: Timezones don't match. 'UTC != America/New_York'

Ostensibly, the underlying data vector is the same in both cases:

>>> series_dt.dtype
datetime64[ns, UTC]
>>> df.datetime.dtype
datetime64[ns, UTC]

Problem description

I believe the DatetimeIndex is correct (or at least more useful), because even if timezones don't agree, the underlying instants are well-ordered and compare fine. In fact, both versions compare fine using a simple > comparison:

>>> dt > series_dt
array([ True, False])
>>> dt > df.datetime
0     True
1    False
Name: datetime, dtype: bool

Expected Output

>>> df.datetime.searchsorted(dt)
1

A workaround is to wrap the column using pd.DatetimeIndex():

>>> pd.DatetimeIndex(df.datetime).searchsorted(dt)
1

Output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.3.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 18.0.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : en_US.UTF-8
pandas           : 0.25.1
numpy            : 1.17.2
pytz             : 2019.3
dateutil         : 2.8.0
pip              : 19.3.1
setuptools       : 41.4.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
##teamcity[testStdOut timestamp='2019-12-05T12:38:05.901' flowId='test.test_driver.MyTestCase.test_driver' locationHint='python://test.test_driver.MyTestCase.test_driver' name='test_driver' nodeId='75' out='feather          : None|n' parentNodeId='74']
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.10.3
IPython          : None
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
s3fs             : 0.3.5
scipy            : 1.3.1
sqlalchemy       : None
tables           : None
xarray           : None
##teamcity[testStdOut timestamp='2019-12-05T12:38:05.904' flowId='test.test_driver.MyTestCase.test_driver' locationHint='python://test.test_driver.MyTestCase.test_driver' name='test_driver' nodeId='75' out='xlrd             : None|n' parentNodeId='74']
xlwt             : None
xlsxwriter       : None

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2020-01-26T04:23:54Z

This now works on master for me, can you confirm

mroeschke · 2020-03-30T03:18:12Z

Confirmed. Could use a test

jbrockmendel added the Timezones Timezone data dtype label Dec 18, 2019

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Timezones Timezone data dtype labels Mar 30, 2020

jamescobonkerr mentioned this issue Mar 31, 2020

TST: cover search_sorted scalar mixed timezones case #33185

Merged

5 tasks

jreback added this to the 1.1 milestone Mar 31, 2020

jreback closed this as completed in #33185 Mar 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series.searchsorted with different timezones #30086

Series.searchsorted with different timezones #30086

kenahoo commented Dec 5, 2019

jbrockmendel commented Jan 26, 2020

mroeschke commented Mar 30, 2020

Series.searchsorted with different timezones #30086

Series.searchsorted with different timezones #30086

Comments

kenahoo commented Dec 5, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jbrockmendel commented Jan 26, 2020

mroeschke commented Mar 30, 2020

Output of `pd.show_versions()`