Skip to content

Commit 047acde

Browse files
MarcoGorelliphofl
andauthored
API deprecate date_parser, add date_format (#51019)
* wip * fixup * update user guide * whatsnew * gh number * update user guide; * whatsnew * update whatsnew note with date_format enhancement * make example ipython code-block * add tests for date_format * wip add this to read_excel too * validate within _parser * minor fixup * mention other readers in whatsnew, None -> no_default * Update v2.0.0.rst * fixup merge conflict resolution --------- Co-authored-by: MarcoGorelli <> Co-authored-by: Patrick Hoefler <[email protected]>
1 parent cefc6f8 commit 047acde

File tree

9 files changed

+264
-88
lines changed

9 files changed

+264
-88
lines changed

doc/source/user_guide/io.rst

+22-33
Original file line numberDiff line numberDiff line change
@@ -290,6 +290,16 @@ date_parser : function, default ``None``
290290
values from the columns defined by parse_dates into a single array and pass
291291
that; and 3) call date_parser once for each row using one or more strings
292292
(corresponding to the columns defined by parse_dates) as arguments.
293+
294+
.. deprecated:: 2.0.0
295+
Use ``date_format`` instead, or read in as ``object`` and then apply
296+
:func:`to_datetime` as-needed.
297+
date_format : str, default ``None``
298+
If used in conjunction with ``parse_dates``, will parse dates according to this
299+
format. For anything more complex (e.g. different formats for different columns),
300+
please read in as ``object`` and then apply :func:`to_datetime` as-needed.
301+
302+
.. versionadded:: 2.0.0
293303
dayfirst : boolean, default ``False``
294304
DD/MM format dates, international and European format.
295305
cache_dates : boolean, default True
@@ -800,7 +810,7 @@ Specifying date columns
800810
+++++++++++++++++++++++
801811

802812
To better facilitate working with datetime data, :func:`read_csv`
803-
uses the keyword arguments ``parse_dates`` and ``date_parser``
813+
uses the keyword arguments ``parse_dates`` and ``date_format``
804814
to allow users to specify a variety of columns and date/time formats to turn the
805815
input text data into ``datetime`` objects.
806816

@@ -898,33 +908,15 @@ data columns:
898908
Date parsing functions
899909
++++++++++++++++++++++
900910

901-
Finally, the parser allows you to specify a custom ``date_parser`` function to
902-
take full advantage of the flexibility of the date parsing API:
903-
904-
.. ipython:: python
905-
906-
df = pd.read_csv(
907-
"tmp.csv", header=None, parse_dates=date_spec, date_parser=pd.to_datetime
908-
)
909-
df
910-
911-
pandas will try to call the ``date_parser`` function in three different ways. If
912-
an exception is raised, the next one is tried:
913-
914-
1. ``date_parser`` is first called with one or more arrays as arguments,
915-
as defined using ``parse_dates`` (e.g., ``date_parser(['2013', '2013'], ['1', '2'])``).
916-
917-
2. If #1 fails, ``date_parser`` is called with all the columns
918-
concatenated row-wise into a single array (e.g., ``date_parser(['2013 1', '2013 2'])``).
911+
Finally, the parser allows you to specify a custom ``date_format``.
912+
Performance-wise, you should try these methods of parsing dates in order:
919913

920-
Note that performance-wise, you should try these methods of parsing dates in order:
914+
1. If you know the format, use ``date_format``, e.g.:
915+
``date_format="%d/%m/%Y"``.
921916

922-
1. If you know the format, use ``pd.to_datetime()``:
923-
``date_parser=lambda x: pd.to_datetime(x, format=...)``.
924-
925-
2. If you have a really non-standard format, use a custom ``date_parser`` function.
926-
For optimal performance, this should be vectorized, i.e., it should accept arrays
927-
as arguments.
917+
2. If you different formats for different columns, or want to pass any extra options (such
918+
as ``utc``) to ``to_datetime``, then you should read in your data as ``object`` dtype, and
919+
then use ``to_datetime``.
928920

929921

930922
.. ipython:: python
@@ -952,16 +944,13 @@ an object-dtype column with strings, even with ``parse_dates``.
952944
df = pd.read_csv(StringIO(content), parse_dates=["a"])
953945
df["a"]
954946
955-
To parse the mixed-timezone values as a datetime column, pass a partially-applied
956-
:func:`to_datetime` with ``utc=True`` as the ``date_parser``.
947+
To parse the mixed-timezone values as a datetime column, read in as ``object`` dtype and
948+
then call :func:`to_datetime` with ``utc=True``.
957949

958950
.. ipython:: python
959951
960-
df = pd.read_csv(
961-
StringIO(content),
962-
parse_dates=["a"],
963-
date_parser=lambda col: pd.to_datetime(col, utc=True),
964-
)
952+
df = pd.read_csv(StringIO(content))
953+
df["a"] = pd.to_datetime(df["a"], utc=True)
965954
df["a"]
966955
967956

doc/source/whatsnew/v0.24.0.rst

+12-4
Original file line numberDiff line numberDiff line change
@@ -686,11 +686,19 @@ Parsing mixed-timezones with :func:`read_csv`
686686
As can be seen, the ``dtype`` is object; each value in the column is a string.
687687
To convert the strings to an array of datetimes, the ``date_parser`` argument
688688

689-
.. ipython:: python
689+
.. code-block:: ipython
690690
691-
df = pd.read_csv(io.StringIO(content), parse_dates=['a'],
692-
date_parser=lambda col: pd.to_datetime(col, utc=True))
693-
df.a
691+
In [3]: df = pd.read_csv(
692+
...: io.StringIO(content),
693+
...: parse_dates=['a'],
694+
...: date_parser=lambda col: pd.to_datetime(col, utc=True),
695+
...: )
696+
697+
In [4]: df.a
698+
Out[4]:
699+
0 1999-12-31 19:00:00+00:00
700+
1 1999-12-31 18:00:00+00:00
701+
Name: a, dtype: datetime64[ns, UTC]
694702
695703
See :ref:`whatsnew_0240.api.timezone_offset_parsing` for more.
696704

doc/source/whatsnew/v2.0.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,7 @@ Other enhancements
316316
- Added :meth:`DatetimeIndex.as_unit` and :meth:`TimedeltaIndex.as_unit` to convert to different resolutions; supported resolutions are "s", "ms", "us", and "ns" (:issue:`50616`)
317317
- Added :meth:`Series.dt.unit` and :meth:`Series.dt.as_unit` to convert to different resolutions; supported resolutions are "s", "ms", "us", and "ns" (:issue:`51223`)
318318
- Added new argument ``dtype`` to :func:`read_sql` to be consistent with :func:`read_sql_query` (:issue:`50797`)
319+
- :func:`read_csv`, :func:`read_table`, :func:`read_fwf` and :func:`read_excel` now accept ``date_format`` (:issue:`50601`)
319320
- :func:`to_datetime` now accepts ``"ISO8601"`` as an argument to ``format``, which will match any ISO8601 string (but possibly not identically-formatted) (:issue:`50411`)
320321
- :func:`to_datetime` now accepts ``"mixed"`` as an argument to ``format``, which will infer the format for each element individually (:issue:`50972`)
321322
- Added new argument ``engine`` to :func:`read_json` to support parsing JSON with pyarrow by specifying ``engine="pyarrow"`` (:issue:`48893`)
@@ -832,6 +833,7 @@ Deprecations
832833
- :meth:`Index.is_categorical` has been deprecated. Use :func:`pandas.api.types.is_categorical_dtype` instead (:issue:`50042`)
833834
- :meth:`Index.is_object` has been deprecated. Use :func:`pandas.api.types.is_object_dtype` instead (:issue:`50042`)
834835
- :meth:`Index.is_interval` has been deprecated. Use :func:`pandas.api.types.is_interval_dtype` instead (:issue:`50042`)
836+
- Deprecated argument ``date_parser`` in :func:`read_csv`, :func:`read_table`, :func:`read_fwf`, and :func:`read_excel` in favour of ``date_format`` (:issue:`50601`)
835837
- Deprecated ``all`` and ``any`` reductions with ``datetime64`` and :class:`DatetimeTZDtype` dtypes, use e.g. ``(obj != pd.Timestamp(0), tz=obj.tz).all()`` instead (:issue:`34479`)
836838
- Deprecated unused arguments ``*args`` and ``**kwargs`` in :class:`Resampler` (:issue:`50977`)
837839
- Deprecated calling ``float`` or ``int`` on a single element :class:`Series` to return a ``float`` or ``int`` respectively. Extract the element before calling ``float`` or ``int`` instead (:issue:`51101`)

pandas/io/excel/_base.py

+23-5
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,16 @@
250250
and pass that; and 3) call `date_parser` once for each row using one or
251251
more strings (corresponding to the columns defined by `parse_dates`) as
252252
arguments.
253+
254+
.. deprecated:: 2.0.0
255+
Use ``date_format`` instead, or read in as ``object`` and then apply
256+
:func:`to_datetime` as-needed.
257+
date_format : str, default ``None``
258+
If used in conjunction with ``parse_dates``, will parse dates according to this
259+
format. For anything more complex (e.g. different formats for different columns),
260+
please read in as ``object`` and then apply :func:`to_datetime` as-needed.
261+
262+
.. versionadded:: 2.0.0
253263
thousands : str, default None
254264
Thousands separator for parsing string columns to numeric. Note that
255265
this parameter is only necessary for columns stored as TEXT in Excel,
@@ -386,7 +396,8 @@ def read_excel(
386396
na_filter: bool = ...,
387397
verbose: bool = ...,
388398
parse_dates: list | dict | bool = ...,
389-
date_parser: Callable | None = ...,
399+
date_parser: Callable | lib.NoDefault = ...,
400+
date_format: str | None = ...,
390401
thousands: str | None = ...,
391402
decimal: str = ...,
392403
comment: str | None = ...,
@@ -425,7 +436,8 @@ def read_excel(
425436
na_filter: bool = ...,
426437
verbose: bool = ...,
427438
parse_dates: list | dict | bool = ...,
428-
date_parser: Callable | None = ...,
439+
date_parser: Callable | lib.NoDefault = ...,
440+
date_format: str | None = ...,
429441
thousands: str | None = ...,
430442
decimal: str = ...,
431443
comment: str | None = ...,
@@ -464,7 +476,8 @@ def read_excel(
464476
na_filter: bool = True,
465477
verbose: bool = False,
466478
parse_dates: list | dict | bool = False,
467-
date_parser: Callable | None = None,
479+
date_parser: Callable | lib.NoDefault = lib.no_default,
480+
date_format: str | None = None,
468481
thousands: str | None = None,
469482
decimal: str = ".",
470483
comment: str | None = None,
@@ -508,6 +521,7 @@ def read_excel(
508521
verbose=verbose,
509522
parse_dates=parse_dates,
510523
date_parser=date_parser,
524+
date_format=date_format,
511525
thousands=thousands,
512526
decimal=decimal,
513527
comment=comment,
@@ -711,7 +725,8 @@ def parse(
711725
na_values=None,
712726
verbose: bool = False,
713727
parse_dates: list | dict | bool = False,
714-
date_parser: Callable | None = None,
728+
date_parser: Callable | lib.NoDefault = lib.no_default,
729+
date_format: str | None = None,
715730
thousands: str | None = None,
716731
decimal: str = ".",
717732
comment: str | None = None,
@@ -870,6 +885,7 @@ def parse(
870885
skip_blank_lines=False, # GH 39808
871886
parse_dates=parse_dates,
872887
date_parser=date_parser,
888+
date_format=date_format,
873889
thousands=thousands,
874890
decimal=decimal,
875891
comment=comment,
@@ -1537,7 +1553,8 @@ def parse(
15371553
nrows: int | None = None,
15381554
na_values=None,
15391555
parse_dates: list | dict | bool = False,
1540-
date_parser: Callable | None = None,
1556+
date_parser: Callable | lib.NoDefault = lib.no_default,
1557+
date_format: str | None = None,
15411558
thousands: str | None = None,
15421559
comment: str | None = None,
15431560
skipfooter: int = 0,
@@ -1570,6 +1587,7 @@ def parse(
15701587
na_values=na_values,
15711588
parse_dates=parse_dates,
15721589
date_parser=date_parser,
1590+
date_format=date_format,
15731591
thousands=thousands,
15741592
comment=comment,
15751593
skipfooter=skipfooter,

pandas/io/parsers/base_parser.py

+21-4
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,8 @@ def __init__(self, kwds) -> None:
114114

115115
self.parse_dates = _validate_parse_dates_arg(kwds.pop("parse_dates", False))
116116
self._parse_date_cols: Iterable = []
117-
self.date_parser = kwds.pop("date_parser", None)
117+
self.date_parser = kwds.pop("date_parser", lib.no_default)
118+
self.date_format = kwds.pop("date_format", None)
118119
self.dayfirst = kwds.pop("dayfirst", False)
119120
self.keep_date_col = kwds.pop("keep_date_col", False)
120121

@@ -133,6 +134,7 @@ def __init__(self, kwds) -> None:
133134

134135
self._date_conv = _make_date_converter(
135136
date_parser=self.date_parser,
137+
date_format=self.date_format,
136138
dayfirst=self.dayfirst,
137139
cache_dates=self.cache_dates,
138140
)
@@ -1089,16 +1091,30 @@ def _get_empty_meta(
10891091

10901092

10911093
def _make_date_converter(
1092-
date_parser=None,
1094+
date_parser=lib.no_default,
10931095
dayfirst: bool = False,
10941096
cache_dates: bool = True,
1097+
date_format: str | None = None,
10951098
):
1099+
if date_parser is not lib.no_default:
1100+
warnings.warn(
1101+
"The argument 'date_parser' is deprecated and will "
1102+
"be removed in a future version. "
1103+
"Please use 'date_format' instead, or read your data in as 'object' dtype "
1104+
"and then call 'to_datetime'.",
1105+
FutureWarning,
1106+
stacklevel=find_stack_level(),
1107+
)
1108+
if date_parser is not lib.no_default and date_format is not None:
1109+
raise TypeError("Cannot use both 'date_parser' and 'date_format'")
1110+
10961111
def converter(*date_cols):
1097-
if date_parser is None:
1112+
if date_parser is lib.no_default:
10981113
strs = parsing.concat_date_cols(date_cols)
10991114

11001115
return tools.to_datetime(
11011116
ensure_object(strs),
1117+
format=date_format,
11021118
utc=False,
11031119
dayfirst=dayfirst,
11041120
errors="ignore",
@@ -1152,7 +1168,8 @@ def converter(*date_cols):
11521168
"parse_dates": False,
11531169
"keep_date_col": False,
11541170
"dayfirst": False,
1155-
"date_parser": None,
1171+
"date_parser": lib.no_default,
1172+
"date_format": None,
11561173
"usecols": None,
11571174
# 'iterator': False,
11581175
"chunksize": None,

0 commit comments

Comments
 (0)