Skip to content

BUG: Series.combine_first raises ValueError on mixed-timezone datetime-index #26283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DomKennedy opened this issue May 5, 2019 · 3 comments · Fixed by #52294
Closed

BUG: Series.combine_first raises ValueError on mixed-timezone datetime-index #26283

DomKennedy opened this issue May 5, 2019 · 3 comments · Fixed by #52294
Labels
Bug Datetime Datetime data dtype Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype

Comments

@DomKennedy
Copy link

import pytz
from pandas import Timestamp, Series

uniform_tz = Series({Timestamp("2019-05-01", tz=pytz.UTC): 1})

multi_tz = Series(
    {
        Timestamp("2019-05-01 01:00:00+0100", tz=pytz.timezone("Europe/London")): 2,
        Timestamp("2019-05-02", tz=pytz.UTC): 3,
    }
)

multi_tz.combine_first(uniform_tz)  # works fine
uniform_tz.combine_first(multi_tz)  # raises ValueError

Problem description

left.combine_first(right) unexpectedly raises a ValueError if:

  • left has a timezone-aware datetime index, with the same timezone throughout the index.
  • right has a timezone-aware datetime index, with a mix of different timezones.

The traceback is as follows:

Traceback (most recent call last):
  File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1861, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)
  File "pandas/_libs/tslibs/conversion.pyx", line 185, in pandas._libs.tslibs.conversion.datetime_to_datetime64
ValueError: Array must be all same time zone

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pandas_test.py", line 57, in <module>
    uniform_tz.combine_first(multi_tz)  # raises ValueError
  File "~/.venv/lib/python3.7/site-packages/pandas/core/series.py", line 2605, in combine_first
    new_index = self.index.union(other.index)
  File "~/.venv/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 484, in union
    other = DatetimeIndex(other)
  File "~/.venv/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py", line 303, in __new__
    int_as_wall_time=True)
  File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 376, in _from_sequence
    ambiguous=ambiguous, int_as_wall_time=int_as_wall_time)
  File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1757, in sequence_to_dt64ns
    data, dayfirst=dayfirst, yearfirst=yearfirst)
  File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1866, in objects_to_datetime64ns
    raise e
  File "~/.venv/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1857, in objects_to_datetime64ns
    require_iso8601=require_iso8601
  File "pandas/_libs/tslib.pyx", line 460, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 537, in pandas._libs.tslib.array_to_datetime
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

If the same arguments are swapped, i.e. right.combine_first(left), no error is raised, and the output is as expected (timestamps which are equal but in different timezones are identified, with the left argument's timezone propagating into the output).

The error also doesn't occur at all with the same setup on DataFrames.

It doesn't seem to me that there's any semantic reason that the operation should fail, so this appears to be a bug.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.3.3
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@gfyoung gfyoung added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype labels May 6, 2019
@gfyoung
Copy link
Member

gfyoung commented May 6, 2019

cc @jreback @mroeschke

@mroeschke
Copy link
Member

While this works if both arguments are DataFrames, the resulting index gets converted to UTC:

In [3]: pd.DataFrame(uniform_tz).combine_first(pd.DataFrame(multi_tz))
Out[3]:
                             0
2019-05-01 00:00:00+00:00  1.0
2019-05-02 00:00:00+00:00  3.0

The main issues is in DatetimeIndex.union, and this patch makes the Series case pass if anyone would like to put up a PR:

diff --git a/pandas/core/indexes/datetimes.py b/pandas/core/indexes/datetimes.py
index 6b9806f4d..b91b936ad 100644
--- a/pandas/core/indexes/datetimes.py
+++ b/pandas/core/indexes/datetimes.py
@@ -489,6 +489,9 @@ class DatetimeIndex(DatetimeIndexOpsMixin, Int64Index, DatetimeDelegateMixin):
                 other = DatetimeIndex(other)
             except TypeError:
                 pass
+            except ValueError:
+                from pandas import to_datetime
+                other = to_datetime(other, utc=True)

         this, other = self._maybe_utc_convert(other)

@mroeschke mroeschke added the Timezones Timezone data dtype label May 10, 2019
@mroeschke mroeschke added the Bug label Apr 2, 2020
@jbrockmendel
Copy link
Member

This looks right on main, casts to object instead of UTC. could use a test

@jbrockmendel jbrockmendel added the Needs Tests Unit test(s) needed to prevent regressions label Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Needs Tests Unit test(s) needed to prevent regressions Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
4 participants