Skip to content

Commit 72dc33f

Browse files
meeseeksmachinejreback
authored andcommitted
Backport PR #24989: DOC: Document breaking change to read_csv (#24996)
1 parent 638ac19 commit 72dc33f

File tree

3 files changed

+84
-3
lines changed

3 files changed

+84
-3
lines changed

doc/source/user_guide/io.rst

+30
Original file line numberDiff line numberDiff line change
@@ -989,6 +989,36 @@ a single date rather than the entire array.
989989
990990
os.remove('tmp.csv')
991991
992+
993+
.. _io.csv.mixed_timezones:
994+
995+
Parsing a CSV with mixed Timezones
996+
++++++++++++++++++++++++++++++++++
997+
998+
Pandas cannot natively represent a column or index with mixed timezones. If your CSV
999+
file contains columns with a mixture of timezones, the default result will be
1000+
an object-dtype column with strings, even with ``parse_dates``.
1001+
1002+
1003+
.. ipython:: python
1004+
1005+
content = """\
1006+
a
1007+
2000-01-01T00:00:00+05:00
1008+
2000-01-01T00:00:00+06:00"""
1009+
df = pd.read_csv(StringIO(content), parse_dates=['a'])
1010+
df['a']
1011+
1012+
To parse the mixed-timezone values as a datetime column, pass a partially-applied
1013+
:func:`to_datetime` with ``utc=True`` as the ``date_parser``.
1014+
1015+
.. ipython:: python
1016+
1017+
df = pd.read_csv(StringIO(content), parse_dates=['a'],
1018+
date_parser=lambda col: pd.to_datetime(col, utc=True))
1019+
df['a']
1020+
1021+
9921022
.. _io.dayfirst:
9931023

9941024

doc/source/whatsnew/v0.24.0.rst

+46
Original file line numberDiff line numberDiff line change
@@ -648,6 +648,52 @@ that the dates have been converted to UTC
648648
pd.to_datetime(["2015-11-18 15:30:00+05:30",
649649
"2015-11-18 16:30:00+06:30"], utc=True)
650650
651+
652+
.. _whatsnew_0240.api_breaking.read_csv_mixed_tz:
653+
654+
Parsing mixed-timezones with :func:`read_csv`
655+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
656+
657+
:func:`read_csv` no longer silently converts mixed-timezone columns to UTC (:issue:`24987`).
658+
659+
*Previous Behavior*
660+
661+
.. code-block:: python
662+
663+
>>> import io
664+
>>> content = """\
665+
... a
666+
... 2000-01-01T00:00:00+05:00
667+
... 2000-01-01T00:00:00+06:00"""
668+
>>> df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
669+
>>> df.a
670+
0 1999-12-31 19:00:00
671+
1 1999-12-31 18:00:00
672+
Name: a, dtype: datetime64[ns]
673+
674+
*New Behavior*
675+
676+
.. ipython:: python
677+
678+
import io
679+
content = """\
680+
a
681+
2000-01-01T00:00:00+05:00
682+
2000-01-01T00:00:00+06:00"""
683+
df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
684+
df.a
685+
686+
As can be seen, the ``dtype`` is object; each value in the column is a string.
687+
To convert the strings to an array of datetimes, the ``date_parser`` argument
688+
689+
.. ipython:: python
690+
691+
df = pd.read_csv(io.StringIO(content), parse_dates=['a'],
692+
date_parser=lambda col: pd.to_datetime(col, utc=True))
693+
df.a
694+
695+
See :ref:`whatsnew_0240.api.timezone_offset_parsing` for more.
696+
651697
.. _whatsnew_0240.api_breaking.period_end_time:
652698

653699
Time values in ``dt.end_time`` and ``to_timestamp(how='end')``

pandas/io/parsers.py

+8-3
Original file line numberDiff line numberDiff line change
@@ -203,9 +203,14 @@
203203
* dict, e.g. {{'foo' : [1, 3]}} -> parse columns 1, 3 as date and call
204204
result 'foo'
205205
206-
If a column or index contains an unparseable date, the entire column or
207-
index will be returned unaltered as an object data type. For non-standard
208-
datetime parsing, use ``pd.to_datetime`` after ``pd.read_csv``
206+
If a column or index cannot be represented as an array of datetimes,
207+
say because of an unparseable value or a mixture of timezones, the column
208+
or index will be returned unaltered as an object data type. For
209+
non-standard datetime parsing, use ``pd.to_datetime`` after
210+
``pd.read_csv``. To parse an index or column with a mixture of timezones,
211+
specify ``date_parser`` to be a partially-applied
212+
:func:`pandas.to_datetime` with ``utc=True``. See
213+
:ref:`io.csv.mixed_timezones` for more.
209214
210215
Note: A fast-path exists for iso8601-formatted dates.
211216
infer_datetime_format : bool, default False

0 commit comments

Comments
 (0)