Skip to content

Commit 0e8c730

Browse files
DEPR: passing mixed offsets with utc=False into to_datetime (#54014)
* add raising FutureWarning in _return_parsed_timezone_results and in array_to_datetime, fix tests * correct the definition of _return_parsed_timezone_results, added a test for FutureWarning, fix tests * fix tests in pandas/tests/extension/test_arrow.py * correct the definition of _return_parsed_timezone_results * fix an exanple in docs: Parsing a CSV with mixed timezones * correct def _array_to_datetime_object, add a test for mixed format, fix errors in docs * correct example in whatsnew/v0.24.0.rst and fix pylint failures * correct str for message in FutureWarning in the test with format mixed * fix an error in an example in whatsnew/v0.24.0.rst * correct examples and the description of the param utc in docstring of to_datetime, correct an example in whatsnew/v1.1.0.rst * update whatsnew/v2.1.0.rst * correct docstring for to_datetime, example in whatsnew/v0.24.0.rst, rename test functions * refactor tests for to_datetime * refactor test for to_datetime * add example to whatsnew/v2.1.0.rst * correct the example * correct the example in whatsnew/v2.1.0.rst * correct def _array_to_datetime_object and fix test for read_json * add catch_warnings to filter the warning in test_read_datetime * correct msg in catch_warnings to filter the warning in test_read_datetime * catch the warning in test_from_csv_with_mixed_offsets * reword whatsnew * add catch_warnings to converter * describe how to maintain the old behavior * add an example how to get the old behavior and : correct the warning message --------- Co-authored-by: MarcoGorelli <[email protected]>
1 parent b8e14ec commit 0e8c730

File tree

11 files changed

+348
-119
lines changed

11 files changed

+348
-119
lines changed

doc/source/user_guide/io.rst

+2-8
Original file line numberDiff line numberDiff line change
@@ -931,6 +931,8 @@ Parsing a CSV with mixed timezones
931931
pandas cannot natively represent a column or index with mixed timezones. If your CSV
932932
file contains columns with a mixture of timezones, the default result will be
933933
an object-dtype column with strings, even with ``parse_dates``.
934+
To parse the mixed-timezone values as a datetime column, read in as ``object`` dtype and
935+
then call :func:`to_datetime` with ``utc=True``.
934936

935937

936938
.. ipython:: python
@@ -939,14 +941,6 @@ an object-dtype column with strings, even with ``parse_dates``.
939941
a
940942
2000-01-01T00:00:00+05:00
941943
2000-01-01T00:00:00+06:00"""
942-
df = pd.read_csv(StringIO(content), parse_dates=["a"])
943-
df["a"]
944-
945-
To parse the mixed-timezone values as a datetime column, read in as ``object`` dtype and
946-
then call :func:`to_datetime` with ``utc=True``.
947-
948-
.. ipython:: python
949-
950944
df = pd.read_csv(StringIO(content))
951945
df["a"] = pd.to_datetime(df["a"], utc=True)
952946
df["a"]

doc/source/whatsnew/v0.24.0.rst

+27-14
Original file line numberDiff line numberDiff line change
@@ -632,13 +632,19 @@ Parsing datetime strings with the same UTC offset will preserve the UTC offset i
632632
Parsing datetime strings with different UTC offsets will now create an Index of
633633
``datetime.datetime`` objects with different UTC offsets
634634

635-
.. ipython:: python
635+
.. code-block:: ipython
636+
637+
In [59]: idx = pd.to_datetime(["2015-11-18 15:30:00+05:30",
638+
"2015-11-18 16:30:00+06:30"])
639+
640+
In[60]: idx
641+
Out[60]: Index([2015-11-18 15:30:00+05:30, 2015-11-18 16:30:00+06:30], dtype='object')
642+
643+
In[61]: idx[0]
644+
Out[61]: Timestamp('2015-11-18 15:30:00+0530', tz='UTC+05:30')
636645
637-
idx = pd.to_datetime(["2015-11-18 15:30:00+05:30",
638-
"2015-11-18 16:30:00+06:30"])
639-
idx
640-
idx[0]
641-
idx[1]
646+
In[62]: idx[1]
647+
Out[62]: Timestamp('2015-11-18 16:30:00+0630', tz='UTC+06:30')
642648
643649
Passing ``utc=True`` will mimic the previous behavior but will correctly indicate
644650
that the dates have been converted to UTC
@@ -673,15 +679,22 @@ Parsing mixed-timezones with :func:`read_csv`
673679
674680
*New behavior*
675681

676-
.. ipython:: python
682+
.. code-block:: ipython
683+
684+
In[64]: import io
685+
686+
In[65]: content = """\
687+
...: a
688+
...: 2000-01-01T00:00:00+05:00
689+
...: 2000-01-01T00:00:00+06:00"""
690+
691+
In[66]: df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
677692
678-
import io
679-
content = """\
680-
a
681-
2000-01-01T00:00:00+05:00
682-
2000-01-01T00:00:00+06:00"""
683-
df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
684-
df.a
693+
In[67]: df.a
694+
Out[67]:
695+
0 2000-01-01 00:00:00+05:00
696+
1 2000-01-01 00:00:00+06:00
697+
Name: a, Length: 2, dtype: object
685698
686699
As can be seen, the ``dtype`` is object; each value in the column is a string.
687700
To convert the strings to an array of datetimes, the ``date_parser`` argument

doc/source/whatsnew/v1.1.0.rst

+8-1
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,14 @@ For example:
208208
tz_strs = ["2010-01-01 12:00:00 +0100", "2010-01-01 12:00:00 -0100",
209209
"2010-01-01 12:00:00 +0300", "2010-01-01 12:00:00 +0400"]
210210
pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z', utc=True)
211-
pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z')
211+
212+
.. code-block:: ipython
213+
214+
In[37]: pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z')
215+
Out[37]:
216+
Index([2010-01-01 12:00:00+01:00, 2010-01-01 12:00:00-01:00,
217+
2010-01-01 12:00:00+03:00, 2010-01-01 12:00:00+04:00],
218+
dtype='object')
212219
213220
.. _whatsnew_110.grouper_resample_origin:
214221

doc/source/whatsnew/v2.1.0.rst

+47-2
Original file line numberDiff line numberDiff line change
@@ -295,8 +295,53 @@ Other API changes
295295
.. ---------------------------------------------------------------------------
296296
.. _whatsnew_210.deprecations:
297297

298-
Deprecations
299-
~~~~~~~~~~~~
298+
Deprecate parsing datetimes with mixed time zones
299+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
300+
301+
Parsing datetimes with mixed time zones is deprecated and shows a warning unless user passes ``utc=True`` to :func:`to_datetime` (:issue:`50887`)
302+
303+
*Previous behavior*:
304+
305+
.. code-block:: ipython
306+
307+
In [7]: data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"]
308+
309+
In [8]: pd.to_datetime(data, utc=False)
310+
Out[8]:
311+
Index([2020-01-01 00:00:00+06:00, 2020-01-01 00:00:00+01:00], dtype='object')
312+
313+
*New behavior*:
314+
315+
.. code-block:: ipython
316+
317+
In [9]: pd.to_datetime(data, utc=False)
318+
FutureWarning:
319+
In a future version of pandas, parsing datetimes with mixed time zones will raise
320+
a warning unless `utc=True`. Please specify `utc=True` to opt in to the new behaviour
321+
and silence this warning. To create a `Series` with mixed offsets and `object` dtype,
322+
please use `apply` and `datetime.datetime.strptime`.
323+
Index([2020-01-01 00:00:00+06:00, 2020-01-01 00:00:00+01:00], dtype='object')
324+
325+
In order to silence this warning and avoid an error in a future version of pandas,
326+
please specify ``utc=True``:
327+
328+
.. ipython:: python
329+
330+
data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"]
331+
pd.to_datetime(data, utc=True)
332+
333+
To create a ``Series`` with mixed offsets and ``object`` dtype, please use ``apply``
334+
and ``datetime.datetime.strptime``:
335+
336+
.. ipython:: python
337+
338+
import datetime as dt
339+
340+
data = ["2020-01-01 00:00:00+06:00", "2020-01-01 00:00:00+01:00"]
341+
pd.Series(data).apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d %H:%M:%S%z'))
342+
343+
Other Deprecations
344+
~~~~~~~~~~~~~~~~~~
300345
- Deprecated 'broadcast_axis' keyword in :meth:`Series.align` and :meth:`DataFrame.align`, upcast before calling ``align`` with ``left = DataFrame({col: left for col in right.columns}, index=right.index)`` (:issue:`51856`)
301346
- Deprecated 'downcast' keyword in :meth:`Index.fillna` (:issue:`53956`)
302347
- Deprecated 'fill_method' and 'limit' keywords in :meth:`DataFrame.pct_change`, :meth:`Series.pct_change`, :meth:`DataFrameGroupBy.pct_change`, and :meth:`SeriesGroupBy.pct_change`, explicitly call ``ffill`` or ``bfill`` before calling ``pct_change`` instead (:issue:`53491`)

pandas/_libs/tslib.pyx

+12
Original file line numberDiff line numberDiff line change
@@ -620,6 +620,7 @@ cdef _array_to_datetime_object(
620620
# 1) NaT or NaT-like values
621621
# 2) datetime strings, which we return as datetime.datetime
622622
# 3) special strings - "now" & "today"
623+
unique_timezones = set()
623624
for i in range(n):
624625
# Analogous to: val = values[i]
625626
val = <object>(<PyObject**>cnp.PyArray_MultiIter_DATA(mi, 1))[0]
@@ -649,6 +650,7 @@ cdef _array_to_datetime_object(
649650
tzinfo=tsobj.tzinfo,
650651
fold=tsobj.fold,
651652
)
653+
unique_timezones.add(tsobj.tzinfo)
652654

653655
except (ValueError, OverflowError) as ex:
654656
ex.args = (f"{ex}, at position {i}", )
@@ -666,6 +668,16 @@ cdef _array_to_datetime_object(
666668

667669
cnp.PyArray_MultiIter_NEXT(mi)
668670

671+
if len(unique_timezones) > 1:
672+
warnings.warn(
673+
"In a future version of pandas, parsing datetimes with mixed time "
674+
"zones will raise a warning unless `utc=True`. "
675+
"Please specify `utc=True` to opt in to the new behaviour "
676+
"and silence this warning. To create a `Series` with mixed offsets and "
677+
"`object` dtype, please use `apply` and `datetime.datetime.strptime`",
678+
FutureWarning,
679+
stacklevel=find_stack_level(),
680+
)
669681
return oresult_nd, None
670682

671683

pandas/core/tools/datetimes.py

+42-7
Original file line numberDiff line numberDiff line change
@@ -340,6 +340,7 @@ def _return_parsed_timezone_results(
340340
tz_result : Index-like of parsed dates with timezone
341341
"""
342342
tz_results = np.empty(len(result), dtype=object)
343+
non_na_timezones = set()
343344
for zone in unique(timezones):
344345
mask = timezones == zone
345346
dta = DatetimeArray(result[mask]).tz_localize(zone)
@@ -348,8 +349,20 @@ def _return_parsed_timezone_results(
348349
dta = dta.tz_localize("utc")
349350
else:
350351
dta = dta.tz_convert("utc")
352+
else:
353+
if not dta.isna().all():
354+
non_na_timezones.add(zone)
351355
tz_results[mask] = dta
352-
356+
if len(non_na_timezones) > 1:
357+
warnings.warn(
358+
"In a future version of pandas, parsing datetimes with mixed time "
359+
"zones will raise a warning unless `utc=True`. Please specify `utc=True` "
360+
"to opt in to the new behaviour and silence this warning. "
361+
"To create a `Series` with mixed offsets and `object` dtype, "
362+
"please use `apply` and `datetime.datetime.strptime`",
363+
FutureWarning,
364+
stacklevel=find_stack_level(),
365+
)
353366
return Index(tz_results, name=name)
354367

355368

@@ -772,6 +785,14 @@ def to_datetime(
772785
offsets (typically, daylight savings), see :ref:`Examples
773786
<to_datetime_tz_examples>` section for details.
774787
788+
.. warning::
789+
790+
In a future version of pandas, parsing datetimes with mixed time
791+
zones will raise a warning unless `utc=True`.
792+
Please specify `utc=True` to opt in to the new behaviour
793+
and silence this warning. To create a `Series` with mixed offsets and
794+
`object` dtype, please use `apply` and `datetime.datetime.strptime`.
795+
775796
See also: pandas general documentation about `timezone conversion and
776797
localization
777798
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
@@ -993,19 +1014,33 @@ def to_datetime(
9931014
9941015
- However, timezone-aware inputs *with mixed time offsets* (for example
9951016
issued from a timezone with daylight savings, such as Europe/Paris)
996-
are **not successfully converted** to a :class:`DatetimeIndex`. Instead a
997-
simple :class:`Index` containing :class:`datetime.datetime` objects is
998-
returned:
999-
1000-
>>> pd.to_datetime(['2020-10-25 02:00 +0200', '2020-10-25 04:00 +0100'])
1017+
are **not successfully converted** to a :class:`DatetimeIndex`.
1018+
Parsing datetimes with mixed time zones will show a warning unless
1019+
`utc=True`. If you specify `utc=False` the warning below will be shown
1020+
and a simple :class:`Index` containing :class:`datetime.datetime`
1021+
objects will be returned:
1022+
1023+
>>> pd.to_datetime(['2020-10-25 02:00 +0200',
1024+
... '2020-10-25 04:00 +0100']) # doctest: +SKIP
1025+
FutureWarning: In a future version of pandas, parsing datetimes with mixed
1026+
time zones will raise a warning unless `utc=True`. Please specify `utc=True`
1027+
to opt in to the new behaviour and silence this warning. To create a `Series`
1028+
with mixed offsets and `object` dtype, please use `apply` and
1029+
`datetime.datetime.strptime`.
10011030
Index([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00],
10021031
dtype='object')
10031032
10041033
- A mix of timezone-aware and timezone-naive inputs is also converted to
10051034
a simple :class:`Index` containing :class:`datetime.datetime` objects:
10061035
10071036
>>> from datetime import datetime
1008-
>>> pd.to_datetime(["2020-01-01 01:00:00-01:00", datetime(2020, 1, 1, 3, 0)])
1037+
>>> pd.to_datetime(["2020-01-01 01:00:00-01:00",
1038+
... datetime(2020, 1, 1, 3, 0)]) # doctest: +SKIP
1039+
FutureWarning: In a future version of pandas, parsing datetimes with mixed
1040+
time zones will raise a warning unless `utc=True`. Please specify `utc=True`
1041+
to opt in to the new behaviour and silence this warning. To create a `Series`
1042+
with mixed offsets and `object` dtype, please use `apply` and
1043+
`datetime.datetime.strptime`.
10091044
Index([2020-01-01 01:00:00-01:00, 2020-01-01 03:00:00], dtype='object')
10101045
10111046
|

pandas/io/json/_json.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -1312,7 +1312,14 @@ def _try_convert_to_date(self, data):
13121312
date_units = (self.date_unit,) if self.date_unit else self._STAMP_UNITS
13131313
for date_unit in date_units:
13141314
try:
1315-
new_data = to_datetime(new_data, errors="raise", unit=date_unit)
1315+
with warnings.catch_warnings():
1316+
warnings.filterwarnings(
1317+
"ignore",
1318+
".*parsing datetimes with mixed time "
1319+
"zones will raise a warning",
1320+
category=FutureWarning,
1321+
)
1322+
new_data = to_datetime(new_data, errors="raise", unit=date_unit)
13161323
except (ValueError, OverflowError, TypeError):
13171324
continue
13181325
return new_data, True

pandas/io/parsers/base_parser.py

+42-20
Original file line numberDiff line numberDiff line change
@@ -1144,37 +1144,59 @@ def converter(*date_cols, col: Hashable):
11441144
date_format.get(col) if isinstance(date_format, dict) else date_format
11451145
)
11461146

1147-
result = tools.to_datetime(
1148-
ensure_object(strs),
1149-
format=date_fmt,
1150-
utc=False,
1151-
dayfirst=dayfirst,
1152-
errors="ignore",
1153-
cache=cache_dates,
1154-
)
1147+
with warnings.catch_warnings():
1148+
warnings.filterwarnings(
1149+
"ignore",
1150+
".*parsing datetimes with mixed time zones will raise a warning",
1151+
category=FutureWarning,
1152+
)
1153+
result = tools.to_datetime(
1154+
ensure_object(strs),
1155+
format=date_fmt,
1156+
utc=False,
1157+
dayfirst=dayfirst,
1158+
errors="ignore",
1159+
cache=cache_dates,
1160+
)
11551161
if isinstance(result, DatetimeIndex):
11561162
arr = result.to_numpy()
11571163
arr.flags.writeable = True
11581164
return arr
11591165
return result._values
11601166
else:
11611167
try:
1162-
result = tools.to_datetime(
1163-
date_parser(*(unpack_if_single_element(arg) for arg in date_cols)),
1164-
errors="ignore",
1165-
cache=cache_dates,
1166-
)
1168+
with warnings.catch_warnings():
1169+
warnings.filterwarnings(
1170+
"ignore",
1171+
".*parsing datetimes with mixed time zones "
1172+
"will raise a warning",
1173+
category=FutureWarning,
1174+
)
1175+
result = tools.to_datetime(
1176+
date_parser(
1177+
*(unpack_if_single_element(arg) for arg in date_cols)
1178+
),
1179+
errors="ignore",
1180+
cache=cache_dates,
1181+
)
11671182
if isinstance(result, datetime.datetime):
11681183
raise Exception("scalar parser")
11691184
return result
11701185
except Exception:
1171-
return tools.to_datetime(
1172-
parsing.try_parse_dates(
1173-
parsing.concat_date_cols(date_cols),
1174-
parser=date_parser,
1175-
),
1176-
errors="ignore",
1177-
)
1186+
with warnings.catch_warnings():
1187+
warnings.filterwarnings(
1188+
"ignore",
1189+
".*parsing datetimes with mixed time zones "
1190+
"will raise a warning",
1191+
category=FutureWarning,
1192+
)
1193+
return tools.to_datetime(
1194+
parsing.try_parse_dates(
1195+
parsing.concat_date_cols(date_cols),
1196+
parser=date_parser,
1197+
),
1198+
errors="ignore",
1199+
)
11781200

11791201
return converter
11801202

pandas/tests/indexes/datetimes/test_constructors.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -300,8 +300,10 @@ def test_construction_index_with_mixed_timezones(self):
300300
assert not isinstance(result, DatetimeIndex)
301301

302302
msg = "DatetimeIndex has mixed timezones"
303+
msg_depr = "parsing datetimes with mixed time zones will raise a warning"
303304
with pytest.raises(TypeError, match=msg):
304-
DatetimeIndex(["2013-11-02 22:00-05:00", "2013-11-03 22:00-06:00"])
305+
with tm.assert_produces_warning(FutureWarning, match=msg_depr):
306+
DatetimeIndex(["2013-11-02 22:00-05:00", "2013-11-03 22:00-06:00"])
305307

306308
# length = 1
307309
result = Index([Timestamp("2011-01-01")], name="idx")

0 commit comments

Comments
 (0)