Skip to content

BUG: Series.combine_first with mixed timezones #26379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,7 @@ Reshaping
- Bug in :func:`pandas.cut` where large bins could incorrectly raise an error due to an integer overflow (:issue:`26045`)
- Bug in :func:`DataFrame.sort_index` where an error is thrown when a multi-indexed DataFrame is sorted on all levels with the initial level sorted last (:issue:`26053`)
- Bug in :meth:`Series.nlargest` treats ``True`` as smaller than ``False`` (:issue:`26154`)
- Bug in :meth:`Series.combine_first` where combining a :class:`Series` with an :class:`Index` with mixed timezones would raise a ``ValueError`` (:issue:`26283`)

Sparse
^^^^^^
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -489,6 +489,9 @@ def union(self, other, sort=None):
other = DatetimeIndex(other)
except TypeError:
pass
except ValueError:
# GH 26283: Convert indexes with mixed tz to UTC
other = tools.to_datetime(other, utc=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct; rather you should call the super.union() as these must be combined as Index rather than DTI

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So super().union actually returns the same result because these evaluate as equal:

>>> Timestamp('2019-05-01 00:00:00', tz='UTC') == Timestamp('2019-05-01 01:00:00', tz='Europe/London')
True

While an Index with all Timestamps feels a little more correct, a DTI with UTC timestamps is technically correct too since the above is True. (And less code to handle this special casing is a nice plus)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, we are striving never to coerce to UTC unless its explicit, this should stay as object dtyped.


this, other = self._maybe_utc_convert(other)

Expand Down
22 changes: 21 additions & 1 deletion pandas/tests/series/test_combine_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import pytest

import pandas as pd
from pandas import DataFrame, DatetimeIndex, Series, date_range
from pandas import DataFrame, DatetimeIndex, Series, Timestamp, date_range
import pandas.util.testing as tm
from pandas.util.testing import assert_frame_equal, assert_series_equal

Expand Down Expand Up @@ -368,3 +368,23 @@ def test_append_concat_tz_dateutil(self):

appended = rng.append(rng2)
tm.assert_index_equal(appended, rng3)

def test_combine_first_with_mixed_tz(self):
# GH 26283
uniform_tz = Series({Timestamp("2019-05-01", tz='UTC'): 1.0})

multi_tz = Series(
{
Timestamp("2019-05-01 01:00:00+0100", tz='Europe/London'): 2.0,
Timestamp("2019-05-02", tz='UTC'): 3.0
}
)

result = uniform_tz.combine_first(multi_tz)
expected = Series(
{
Timestamp("2019-05-01", tz='UTC'): 1.0,
Timestamp("2019-05-02", tz='UTC'): 3.0
}
)
tm.assert_series_equal(result, expected)