Skip to content

Commit 336fcc1

Browse files
authored
Improved docstring and return type hints for to_datetime (pandas-dev#42494)
1 parent a255173 commit 336fcc1

File tree

1 file changed

+225
-74
lines changed

1 file changed

+225
-74
lines changed

pandas/core/tools/datetimes.py

+225-74
Original file line numberDiff line numberDiff line change
@@ -691,111 +691,192 @@ def to_datetime(
691691
"""
692692
Convert argument to datetime.
693693
694+
This function converts a scalar, array-like, :class:`Series` or
695+
:class:`DataFrame`/dict-like to a pandas datetime object.
696+
694697
Parameters
695698
----------
696699
arg : int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
697-
The object to convert to a datetime. If the DataFrame is provided, the method
698-
expects minimally the following columns: "year", "month", "day".
700+
The object to convert to a datetime. If a :class:`DataFrame` is provided, the
701+
method expects minimally the following columns: :const:`"year"`,
702+
:const:`"month"`, :const:`"day"`.
699703
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
700-
- If 'raise', then invalid parsing will raise an exception.
701-
- If 'coerce', then invalid parsing will be set as NaT.
702-
- If 'ignore', then invalid parsing will return the input.
704+
- If :const:`'raise'`, then invalid parsing will raise an exception.
705+
- If :const:`'coerce'`, then invalid parsing will be set as :const:`NaT`.
706+
- If :const:`'ignore'`, then invalid parsing will return the input.
703707
dayfirst : bool, default False
704-
Specify a date parse order if `arg` is str or its list-likes.
705-
If True, parses dates with the day first, eg 10/11/12 is parsed as
706-
2012-11-10.
708+
Specify a date parse order if `arg` is str or is list-like.
709+
If :const:`True`, parses dates with the day first, e.g. :const:`"10/11/12"`
710+
is parsed as :const:`2012-11-10`.
707711
708712
.. warning::
709713
710-
dayfirst=True is not strict, but will prefer to parse
714+
``dayfirst=True`` is not strict, but will prefer to parse
711715
with day first. If a delimited date string cannot be parsed in
712716
accordance with the given `dayfirst` option, e.g.
713717
``to_datetime(['31-12-2021'])``, then a warning will be shown.
714718
715719
yearfirst : bool, default False
716-
Specify a date parse order if `arg` is str or its list-likes.
720+
Specify a date parse order if `arg` is str or is list-like.
717721
718-
- If True parses dates with the year first, eg 10/11/12 is parsed as
719-
2010-11-12.
720-
- If both dayfirst and yearfirst are True, yearfirst is preceded (same
721-
as dateutil).
722+
- If :const:`True` parses dates with the year first, e.g.
723+
:const:`"10/11/12"` is parsed as :const:`2010-11-12`.
724+
- If both `dayfirst` and `yearfirst` are :const:`True`, `yearfirst` is
725+
preceded (same as :mod:`dateutil`).
722726
723727
.. warning::
724728
725-
yearfirst=True is not strict, but will prefer to parse
729+
``yearfirst=True`` is not strict, but will prefer to parse
726730
with year first.
727731
728732
utc : bool, default None
729-
Return UTC DatetimeIndex if True (converting any tz-aware
730-
datetime.datetime objects as well).
733+
Control timezone-related parsing, localization and conversion.
734+
735+
- If :const:`True`, the function *always* returns a timezone-aware
736+
UTC-localized :class:`Timestamp`, :class:`Series` or
737+
:class:`DatetimeIndex`. To do this, timezone-naive inputs are
738+
*localized* as UTC, while timezone-aware inputs are *converted* to UTC.
739+
740+
- If :const:`False` (default), inputs will not be coerced to UTC.
741+
Timezone-naive inputs will remain naive, while timezone-aware ones
742+
will keep their time offsets. Limitations exist for mixed
743+
offsets (typically, daylight savings), see :ref:`Examples
744+
<to_datetime_tz_examples>` section for details.
745+
746+
See also: pandas general documentation about `timezone conversion and
747+
localization
748+
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
749+
#time-zone-handling>`_.
750+
731751
format : str, default None
732-
The strftime to parse time, eg "%d/%m/%Y", note that "%f" will parse
733-
all the way up to nanoseconds.
734-
See strftime documentation for more information on choices:
735-
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior.
736-
exact : bool, True by default
737-
Behaves as:
738-
- If True, require an exact format match.
739-
- If False, allow the format to match anywhere in the target string.
752+
The strftime to parse time, e.g. :const:`"%d/%m/%Y"`. Note that
753+
:const:`"%f"` will parse all the way up to nanoseconds. See
754+
`strftime documentation
755+
<https://docs.python.org/3/library/datetime.html
756+
#strftime-and-strptime-behavior>`_ for more information on choices.
757+
exact : bool, default True
758+
Control how `format` is used:
759+
760+
- If :const:`True`, require an exact `format` match.
761+
- If :const:`False`, allow the `format` to match anywhere in the target
762+
string.
740763
741764
unit : str, default 'ns'
742765
The unit of the arg (D,s,ms,us,ns) denote the unit, which is an
743766
integer or float number. This will be based off the origin.
744-
Example, with unit='ms' and origin='unix' (the default), this
767+
Example, with ``unit='ms'`` and ``origin='unix'`` (the default), this
745768
would calculate the number of milliseconds to the unix epoch start.
746769
infer_datetime_format : bool, default False
747-
If True and no `format` is given, attempt to infer the format of the
748-
datetime strings based on the first non-NaN element,
770+
If :const:`True` and no `format` is given, attempt to infer the format
771+
of the datetime strings based on the first non-NaN element,
749772
and if it can be inferred, switch to a faster method of parsing them.
750773
In some cases this can increase the parsing speed by ~5-10x.
751774
origin : scalar, default 'unix'
752775
Define the reference date. The numeric values would be parsed as number
753776
of units (defined by `unit`) since this reference date.
754777
755-
- If 'unix' (or POSIX) time; origin is set to 1970-01-01.
756-
- If 'julian', unit must be 'D', and origin is set to beginning of
757-
Julian Calendar. Julian day number 0 is assigned to the day starting
758-
at noon on January 1, 4713 BC.
778+
- If :const:`'unix'` (or POSIX) time; origin is set to 1970-01-01.
779+
- If :const:`'julian'`, unit must be :const:`'D'`, and origin is set to
780+
beginning of Julian Calendar. Julian day number :const:`0` is assigned
781+
to the day starting at noon on January 1, 4713 BC.
759782
- If Timestamp convertible, origin is set to Timestamp identified by
760783
origin.
761784
cache : bool, default True
762-
If True, use a cache of unique, converted dates to apply the datetime
763-
conversion. May produce significant speed-up when parsing duplicate
764-
date strings, especially ones with timezone offsets. The cache is only
765-
used when there are at least 50 values. The presence of out-of-bounds
766-
values will render the cache unusable and may slow down parsing.
785+
If :const:`True`, use a cache of unique, converted dates to apply the
786+
datetime conversion. May produce significant speed-up when parsing
787+
duplicate date strings, especially ones with timezone offsets. The cache
788+
is only used when there are at least 50 values. The presence of
789+
out-of-bounds values will render the cache unusable and may slow down
790+
parsing.
767791
768792
.. versionchanged:: 0.25.0
769-
- changed default value from False to True.
793+
changed default value from :const:`False` to :const:`True`.
770794
771795
Returns
772796
-------
773797
datetime
774798
If parsing succeeded.
775-
Return type depends on input:
776-
777-
- list-like:
778-
- DatetimeIndex, if timezone naive or aware with the same timezone
779-
- Index of object dtype, if timezone aware with mixed time offsets
780-
- Series: Series of datetime64 dtype
781-
- DataFrame: Series of datetime64 dtype
782-
- scalar: Timestamp
783-
784-
In case when it is not possible to return designated types (e.g. when
785-
any element of input is before Timestamp.min or after Timestamp.max)
786-
return will have datetime.datetime type (or corresponding
787-
array/Series).
799+
Return type depends on input (types in parenthesis correspond to
800+
fallback in case of unsuccessful timezone or out-of-range timestamp
801+
parsing):
802+
803+
- scalar: :class:`Timestamp` (or :class:`datetime.datetime`)
804+
- array-like: :class:`DatetimeIndex` (or :class:`Series` with
805+
:class:`object` dtype containing :class:`datetime.datetime`)
806+
- Series: :class:`Series` of :class:`datetime64` dtype (or
807+
:class:`Series` of :class:`object` dtype containing
808+
:class:`datetime.datetime`)
809+
- DataFrame: :class:`Series` of :class:`datetime64` dtype (or
810+
:class:`Series` of :class:`object` dtype containing
811+
:class:`datetime.datetime`)
812+
813+
Raises
814+
------
815+
ParserError
816+
When parsing a date from string fails.
817+
ValueError
818+
When another datetime conversion error happens. For example when one
819+
of 'year', 'month', day' columns is missing in a :class:`DataFrame`, or
820+
when a Timezone-aware :class:`datetime.datetime` is found in an array-like
821+
of mixed time offsets, and ``utc=False``.
788822
789823
See Also
790824
--------
791825
DataFrame.astype : Cast argument to a specified dtype.
792826
to_timedelta : Convert argument to timedelta.
793827
convert_dtypes : Convert dtypes.
794828
829+
Notes
830+
-----
831+
832+
Many input types are supported, and lead to different output types:
833+
834+
- **scalars** can be int, float, str, datetime object (from stdlib :mod:`datetime`
835+
module or :mod:`numpy`). They are converted to :class:`Timestamp` when
836+
possible, otherwise they are converted to :class:`datetime.datetime`.
837+
None/NaN/null scalars are converted to :const:`NaT`.
838+
839+
- **array-like** can contain int, float, str, datetime objects. They are
840+
converted to :class:`DatetimeIndex` when possible, otherwise they are
841+
converted to :class:`Index` with :class:`object` dtype, containing
842+
:class:`datetime.datetime`. None/NaN/null entries are converted to
843+
:const:`NaT` in both cases.
844+
845+
- **Series** are converted to :class:`Series` with :class:`datetime64`
846+
dtype when possible, otherwise they are converted to :class:`Series` with
847+
:class:`object` dtype, containing :class:`datetime.datetime`. None/NaN/null
848+
entries are converted to :const:`NaT` in both cases.
849+
850+
- **DataFrame/dict-like** are converted to :class:`Series` with
851+
:class:`datetime64` dtype. For each row a datetime is created from assembling
852+
the various dataframe columns. Column keys can be common abbreviations
853+
like [‘year’, ‘month’, ‘day’, ‘minute’, ‘second’, ‘ms’, ‘us’, ‘ns’]) or
854+
plurals of the same.
855+
856+
The following causes are responsible for :class:`datetime.datetime` objects
857+
being returned (possibly inside an :class:`Index` or a :class:`Series` with
858+
:class:`object` dtype) instead of a proper pandas designated type
859+
(:class:`Timestamp`, :class:`DatetimeIndex` or :class:`Series`
860+
with :class:`datetime64` dtype):
861+
862+
- when any input element is before :const:`Timestamp.min` or after
863+
:const:`Timestamp.max`, see `timestamp limitations
864+
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
865+
#timeseries-timestamp-limits>`_.
866+
867+
- when ``utc=False`` (default) and the input is an array-like or
868+
:class:`Series` containing mixed naive/aware datetime, or aware with mixed
869+
time offsets. Note that this happens in the (quite frequent) situation when
870+
the timezone has a daylight savings policy. In that case you may wish to
871+
use ``utc=True``.
872+
795873
Examples
796874
--------
797-
Assembling a datetime from multiple columns of a DataFrame. The keys can be
798-
common abbreviations like ['year', 'month', 'day', 'minute', 'second',
875+
876+
**Handling various input formats**
877+
878+
Assembling a datetime from multiple columns of a :class:`DataFrame`. The keys
879+
can be common abbreviations like ['year', 'month', 'day', 'minute', 'second',
799880
'ms', 'us', 'ns']) or plurals of the same
800881
801882
>>> df = pd.DataFrame({'year': [2015, 2016],
@@ -806,20 +887,7 @@ def to_datetime(
806887
1 2016-03-05
807888
dtype: datetime64[ns]
808889
809-
If a date does not meet the `timestamp limitations
810-
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
811-
#timeseries-timestamp-limits>`_, passing errors='ignore'
812-
will return the original input instead of raising any exception.
813-
814-
Passing errors='coerce' will force an out-of-bounds date to NaT,
815-
in addition to forcing non-dates (or non-parseable dates) to NaT.
816-
817-
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
818-
datetime.datetime(1300, 1, 1, 0, 0)
819-
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce')
820-
NaT
821-
822-
Passing infer_datetime_format=True can often-times speedup a parsing
890+
Passing ``infer_datetime_format=True`` can often-times speedup a parsing
823891
if its not an ISO8601 format exactly, but in a regular format.
824892
825893
>>> s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'] * 1000)
@@ -854,16 +922,99 @@ def to_datetime(
854922
DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'],
855923
dtype='datetime64[ns]', freq=None)
856924
857-
In case input is list-like and the elements of input are of mixed
858-
timezones, return will have object type Index if utc=False.
925+
**Non-convertible date/times**
926+
927+
If a date does not meet the `timestamp limitations
928+
<https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
929+
#timeseries-timestamp-limits>`_, passing ``errors='ignore'``
930+
will return the original input instead of raising any exception.
931+
932+
Passing ``errors='coerce'`` will force an out-of-bounds date to :const:`NaT`,
933+
in addition to forcing non-dates (or non-parseable dates) to :const:`NaT`.
934+
935+
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
936+
datetime.datetime(1300, 1, 1, 0, 0)
937+
>>> pd.to_datetime('13000101', format='%Y%m%d', errors='coerce')
938+
NaT
939+
940+
.. _to_datetime_tz_examples:
941+
942+
**Timezones and time offsets**
943+
944+
The default behaviour (``utc=False``) is as follows:
945+
946+
- Timezone-naive inputs are converted to timezone-naive :class:`DatetimeIndex`:
947+
948+
>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00:15'])
949+
DatetimeIndex(['2018-10-26 12:00:00', '2018-10-26 13:00:15'],
950+
dtype='datetime64[ns]', freq=None)
951+
952+
- Timezone-aware inputs *with constant time offset* are converted to
953+
timezone-aware :class:`DatetimeIndex`:
954+
955+
>>> pd.to_datetime(['2018-10-26 12:00 -0500', '2018-10-26 13:00 -0500'])
956+
DatetimeIndex(['2018-10-26 12:00:00-05:00', '2018-10-26 13:00:00-05:00'],
957+
dtype='datetime64[ns, pytz.FixedOffset(-300)]', freq=None)
958+
959+
- However, timezone-aware inputs *with mixed time offsets* (for example
960+
issued from a timezone with daylight savings, such as Europe/Paris)
961+
are **not successfully converted** to a :class:`DatetimeIndex`. Instead a
962+
simple :class:`Index` containing :class:`datetime.datetime` objects is
963+
returned:
859964
860-
>>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'])
861-
Index([2018-10-26 12:00:00-05:30, 2018-10-26 12:00:00-05:00], dtype='object')
965+
>>> pd.to_datetime(['2020-10-25 02:00 +0200', '2020-10-25 04:00 +0100'])
966+
Index([2020-10-25 02:00:00+02:00, 2020-10-25 04:00:00+01:00],
967+
dtype='object')
968+
969+
- A mix of timezone-aware and timezone-naive inputs is converted to
970+
a timezone-aware :class:`DatetimeIndex` if the offsets of the timezone-aware
971+
are constant:
972+
973+
>>> from datetime import datetime
974+
>>> pd.to_datetime(["2020-01-01 01:00 -01:00", datetime(2020, 1, 1, 3, 0)])
975+
DatetimeIndex(['2020-01-01 01:00:00-01:00', '2020-01-01 02:00:00-01:00'],
976+
dtype='datetime64[ns, pytz.FixedOffset(-60)]', freq=None)
977+
978+
- Finally, mixing timezone-aware strings and :class:`datetime.datetime` always
979+
raises an error, even if the elements all have the same time offset.
980+
981+
>>> from datetime import datetime, timezone, timedelta
982+
>>> d = datetime(2020, 1, 1, 18, tzinfo=timezone(-timedelta(hours=1)))
983+
>>> pd.to_datetime(["2020-01-01 17:00 -0100", d])
984+
Traceback (most recent call last):
985+
...
986+
ValueError: Tz-aware datetime.datetime cannot be converted to datetime64
987+
unless utc=True
988+
989+
|
990+
991+
Setting ``utc=True`` solves most of the above issues:
992+
993+
- Timezone-naive inputs are *localized* as UTC
994+
995+
>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 13:00'], utc=True)
996+
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 13:00:00+00:00'],
997+
dtype='datetime64[ns, UTC]', freq=None)
998+
999+
- Timezone-aware inputs are *converted* to UTC (the output represents the
1000+
exact same datetime, but viewed from the UTC time offset `+00:00`).
8621001
8631002
>>> pd.to_datetime(['2018-10-26 12:00 -0530', '2018-10-26 12:00 -0500'],
8641003
... utc=True)
8651004
DatetimeIndex(['2018-10-26 17:30:00+00:00', '2018-10-26 17:00:00+00:00'],
8661005
dtype='datetime64[ns, UTC]', freq=None)
1006+
1007+
- Inputs can contain both naive and aware, string or datetime, the above
1008+
rules still apply
1009+
1010+
>>> pd.to_datetime(['2018-10-26 12:00', '2018-10-26 12:00 -0530',
1011+
... datetime(2020, 1, 1, 18),
1012+
... datetime(2020, 1, 1, 18,
1013+
... tzinfo=timezone(-timedelta(hours=1)))],
1014+
... utc=True)
1015+
DatetimeIndex(['2018-10-26 12:00:00+00:00', '2018-10-26 17:30:00+00:00',
1016+
'2020-01-01 18:00:00+00:00', '2020-01-01 19:00:00+00:00'],
1017+
dtype='datetime64[ns, UTC]', freq=None)
8671018
"""
8681019
if arg is None:
8691020
return None
@@ -968,9 +1119,9 @@ def _assemble_from_unit_mappings(arg, errors, tz):
9681119
arg : DataFrame
9691120
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
9701121
971-
- If 'raise', then invalid parsing will raise an exception
972-
- If 'coerce', then invalid parsing will be set as NaT
973-
- If 'ignore', then invalid parsing will return the input
1122+
- If :const:`'raise'`, then invalid parsing will raise an exception
1123+
- If :const:`'coerce'`, then invalid parsing will be set as :const:`NaT`
1124+
- If :const:`'ignore'`, then invalid parsing will return the input
9741125
tz : None or 'utc'
9751126
9761127
Returns

0 commit comments

Comments
 (0)