Skip to content

BUG: fix to_datetime failing to raise on mixed tznaive/tzaware datetimes #24663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 9, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1549,6 +1549,7 @@ Datetimelike
- Bug in :class:`PeriodIndex` where comparisons against an array-like object with length 1 failed to raise ``ValueError`` (:issue:`23078`)
- Bug in :meth:`DatetimeIndex.astype`, :meth:`PeriodIndex.astype` and :meth:`TimedeltaIndex.astype` ignoring the sign of the ``dtype`` for unsigned integer dtypes (:issue:`24405`).
- Fixed bug in :meth:`Series.max` with ``datetime64[ns]``-dtype failing to return ``NaT`` when nulls are present and ``skipna=False`` is passed (:issue:`24265`)
- Bug in :func:`to_datetime` where arrays of ``datetime`` objects containing both timezone-aware and timezone-naive ``datetimes`` would fail to raise ``ValueError`` (:issue:`24569`)

Timedelta
^^^^^^^^^
Expand Down
5 changes: 5 additions & 0 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@ def datetime_to_datetime64(values: object[:]):
int64_t[:] iresult
npy_datetimestruct dts
_TSObject _ts
bint found_naive = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel is there a reason we cannot remove this routine entirely in favor of using array_to_datetime? can you create an issue about this (though maybe you already did)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#24006 does this, but is on hold ATM


result = np.empty(n, dtype='M8[ns]')
iresult = result.view('i8')
Expand All @@ -176,6 +177,9 @@ def datetime_to_datetime64(values: object[:]):
iresult[i] = NPY_NAT
elif PyDateTime_Check(val):
if val.tzinfo is not None:
if found_naive:
raise ValueError('Cannot mix tz-aware with '
'tz-naive values')
if inferred_tz is not None:
if not tz_compare(val.tzinfo, inferred_tz):
raise ValueError('Array must be all same time zone')
Expand All @@ -186,6 +190,7 @@ def datetime_to_datetime64(values: object[:]):
iresult[i] = _ts.value
check_dts_bounds(&_ts.dts)
else:
found_naive = True
if inferred_tz is not None:
raise ValueError('Cannot mix tz-aware with '
'tz-naive values')
Expand Down
6 changes: 1 addition & 5 deletions pandas/tests/arrays/test_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,11 +151,7 @@ def test_array_inference(data, expected):
[pd.Timestamp("2000", tz="CET"), pd.Timestamp("2000", tz="UTC")],
# Mix of tz-aware and tz-naive
[pd.Timestamp("2000", tz="CET"), pd.Timestamp("2000")],
# GH-24569
pytest.param(
np.array([pd.Timestamp('2000'), pd.Timestamp('2000', tz='CET')]),
marks=pytest.mark.xfail(reason="bug in DTA._from_sequence")
),
np.array([pd.Timestamp('2000'), pd.Timestamp('2000', tz='CET')]),
])
def test_array_inference_fails(data):
result = pd.array(data)
Expand Down
19 changes: 19 additions & 0 deletions pandas/tests/arrays/test_datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,25 @@


class TestDatetimeArrayConstructor(object):
def test_mixing_naive_tzaware_raises(self):
# GH#24569
arr = np.array([pd.Timestamp('2000'), pd.Timestamp('2000', tz='CET')])
Copy link
Member

@jschendel jschendel Jan 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that verifies mixed tz example from the issue raises? I didn't see one quickly glancing over this file, but could be missing it or it could be elsewhere. If not, you could maybe parametrize over a data variable that does both orders for aware/naive and the aware/aware case and get ride of the for obj loop, i.e.

@pytest.mark.parametrize('data', [
    [pd.Timestamp('2000', tz='CET'), pd.Timestamp('2000')],
    [pd.Timestamp('2000'), pd.Timestamp('2000', tz='CET')],
    [pd.Timestamp('2000', tz="US/Central"), pd.Timestamp('2000', tz='CET')]])
def test_mixing_naive_tzaware_raises(self):
    arr = np.array(data)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively parametrizing over timezone pairs if it seems cleaner, e.g.

@pytest.mark.parametrize('tz1, 'tz2', [("CET", None), (None, "CET"), ("US/Central", "CET")])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this does does hit the case from this original issue, in the DatetimeArray._from_sequence case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed that the DatetimeArray._from_sequence case hits the original issue.

I think my comment was a bit unclear; what I was asking is if we have coverage for the working mixed tz example you also provided in the issue: mix = np.array([pd.Timestamp('2000', tz="US/Central"), pd.Timestamp('2000', tz='CET')]). I didn't see a test for that anywhere, and figured we could shoehorn it into this test if it's indeed not covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a pressing issue or anything though, so if it's not covered it we could just make it a follow-up item; don't want this to block other work.


msg = ('Cannot mix tz-aware with tz-naive values|'
'Tz-aware datetime.datetime cannot be converted '
'to datetime64 unless utc=True')

for obj in [arr, arr[::-1]]:
# check that we raise regardless of whether naive is found
# before aware or vice-versa
for meth in [DatetimeArray._from_sequence,
sequence_to_dt64ns,
pd.to_datetime,
pd.DatetimeIndex]:

with pytest.raises(ValueError, match=msg):
meth(obj)

def test_from_pandas_array(self):
arr = pd.array(np.arange(5, dtype=np.int64)) * 3600 * 10**9

Expand Down
15 changes: 3 additions & 12 deletions pandas/tests/indexes/datetimes/test_construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -306,24 +306,15 @@ def test_construction_dti_with_mixed_timezones(self):
tm.assert_index_equal(result, exp, exact=True)
assert isinstance(result, DatetimeIndex)

# different tz coerces tz-naive to tz-awareIndex(dtype=object)
result = DatetimeIndex([Timestamp('2011-01-01 10:00'),
Timestamp('2011-01-02 10:00',
tz='US/Eastern')], name='idx')
exp = DatetimeIndex([Timestamp('2011-01-01 05:00'),
Timestamp('2011-01-02 10:00')],
tz='US/Eastern', name='idx')
tm.assert_index_equal(result, exp, exact=True)
assert isinstance(result, DatetimeIndex)

# tz mismatch affecting to tz-aware raises TypeError/ValueError

with pytest.raises(ValueError):
DatetimeIndex([Timestamp('2011-01-01 10:00', tz='Asia/Tokyo'),
Timestamp('2011-01-02 10:00', tz='US/Eastern')],
name='idx')

with pytest.raises(TypeError, match='data is already tz-aware'):
msg = 'cannot be converted to datetime64'
with pytest.raises(ValueError, match=msg):
DatetimeIndex([Timestamp('2011-01-01 10:00'),
Timestamp('2011-01-02 10:00', tz='US/Eastern')],
tz='Asia/Tokyo', name='idx')
Expand All @@ -333,7 +324,7 @@ def test_construction_dti_with_mixed_timezones(self):
Timestamp('2011-01-02 10:00', tz='US/Eastern')],
tz='US/Eastern', name='idx')

with pytest.raises(TypeError, match='data is already tz-aware'):
with pytest.raises(ValueError, match=msg):
# passing tz should results in DatetimeIndex, then mismatch raises
# TypeError
Index([pd.NaT, Timestamp('2011-01-01 10:00'),
Expand Down