Skip to content

CLN: Enforce read_csv(keep_date_col, parse_dates) deprecations #58622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
May 10, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 0 additions & 10 deletions asv_bench/benchmarks/io/csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -445,16 +445,6 @@ def setup(self, engine):
data = data.format(*two_cols)
self.StringIO_input = StringIO(data)

def time_multiple_date(self, engine):
read_csv(
self.data(self.StringIO_input),
engine=engine,
sep=",",
header=None,
names=list(string.digits[:9]),
parse_dates=[[1, 2], [1, 3]],
)

def time_baseline(self, engine):
read_csv(
self.data(self.StringIO_input),
Expand Down
81 changes: 1 addition & 80 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -270,15 +270,9 @@ parse_dates : boolean or list of ints or names or list of lists or dict, default
* If ``True`` -> try parsing the index.
* If ``[1, 2, 3]`` -> try parsing columns 1, 2, 3 each as a separate date
column.
* If ``[[1, 3]]`` -> combine columns 1 and 3 and parse as a single date
column.
* If ``{'foo': [1, 3]}`` -> parse columns 1, 3 as date and call result 'foo'.

.. note::
A fast-path exists for iso8601-formatted dates.
keep_date_col : boolean, default ``False``
If ``True`` and parse_dates specifies combining multiple columns then keep the
original columns.
date_parser : function, default ``None``
Function to use for converting a sequence of string columns to an array of
datetime instances. The default uses ``dateutil.parser.parser`` to do the
Expand Down Expand Up @@ -823,71 +817,8 @@ The simplest case is to just pass in ``parse_dates=True``:

It is often the case that we may want to store date and time data separately,
or store various date fields separately. the ``parse_dates`` keyword can be
used to specify a combination of columns to parse the dates and/or times from.

You can specify a list of column lists to ``parse_dates``, the resulting date
columns will be prepended to the output (so as to not affect the existing column
order) and the new column names will be the concatenation of the component
column names:

.. ipython:: python
:okwarning:

data = (
"KORD,19990127, 19:00:00, 18:56:00, 0.8100\n"
"KORD,19990127, 20:00:00, 19:56:00, 0.0100\n"
"KORD,19990127, 21:00:00, 20:56:00, -0.5900\n"
"KORD,19990127, 21:00:00, 21:18:00, -0.9900\n"
"KORD,19990127, 22:00:00, 21:56:00, -0.5900\n"
"KORD,19990127, 23:00:00, 22:56:00, -0.5900"
)

with open("tmp.csv", "w") as fh:
fh.write(data)

df = pd.read_csv("tmp.csv", header=None, parse_dates=[[1, 2], [1, 3]])
df

By default the parser removes the component date columns, but you can choose
to retain them via the ``keep_date_col`` keyword:

.. ipython:: python
:okwarning:

df = pd.read_csv(
"tmp.csv", header=None, parse_dates=[[1, 2], [1, 3]], keep_date_col=True
)
df
used to specify columns to parse the dates and/or times.

Note that if you wish to combine multiple columns into a single date column, a
nested list must be used. In other words, ``parse_dates=[1, 2]`` indicates that
the second and third columns should each be parsed as separate date columns
while ``parse_dates=[[1, 2]]`` means the two columns should be parsed into a
single column.

You can also use a dict to specify custom name columns:

.. ipython:: python
:okwarning:

date_spec = {"nominal": [1, 2], "actual": [1, 3]}
df = pd.read_csv("tmp.csv", header=None, parse_dates=date_spec)
df

It is important to remember that if multiple text columns are to be parsed into
a single date column, then a new column is prepended to the data. The ``index_col``
specification is based off of this new set of columns rather than the original
data columns:


.. ipython:: python
:okwarning:

date_spec = {"nominal": [1, 2], "actual": [1, 3]}
df = pd.read_csv(
"tmp.csv", header=None, parse_dates=date_spec, index_col=0
) # index is the nominal column
df

.. note::
If a column or index contains an unparsable date, the entire column or
Expand All @@ -901,10 +832,6 @@ data columns:
for your data to store datetimes in this format, load times will be
significantly faster, ~20x has been observed.

.. deprecated:: 2.2.0
Combining date columns inside read_csv is deprecated. Use ``pd.to_datetime``
on the relevant result columns instead.


Date parsing functions
++++++++++++++++++++++
Expand All @@ -920,12 +847,6 @@ Performance-wise, you should try these methods of parsing dates in order:
then use ``to_datetime``.


.. ipython:: python
:suppress:

os.remove("tmp.csv")


.. _io.csv.mixed_timezones:

Parsing a CSV with mixed timezones
Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -254,8 +254,10 @@ Removal of prior version deprecations/changes
- Enforced deprecation of :meth:`offsets.Tick.delta`, use ``pd.Timedelta(obj)`` instead (:issue:`55498`)
- Enforced deprecation of ``axis=None`` acting the same as ``axis=0`` in the DataFrame reductions ``sum``, ``prod``, ``std``, ``var``, and ``sem``, passing ``axis=None`` will now reduce over both axes; this is particularly the case when doing e.g. ``numpy.sum(df)`` (:issue:`21597`)
- Enforced deprecation of ``core.internals`` members ``Block``, ``ExtensionBlock``, and ``DatetimeTZBlock`` (:issue:`58467`)
- Enforced deprecation of ``keep_date_col`` keyword in :func:`read_csv` (:issue:`55569`)
- Enforced deprecation of ``quantile`` keyword in :meth:`.Rolling.quantile` and :meth:`.Expanding.quantile`, renamed to ``q`` instead. (:issue:`52550`)
- Enforced deprecation of argument ``infer_datetime_format`` in :func:`read_csv`, as a strict version of it is now the default (:issue:`48621`)
- Enforced deprecation of combining parsed datetime columns in :func:`read_csv` in ``parse_dates`` (:issue:`55569`)
- Enforced deprecation of non-standard (``np.ndarray``, :class:`ExtensionArray`, :class:`Index`, or :class:`Series`) argument to :func:`api.extensions.take` (:issue:`52981`)
- Enforced deprecation of parsing system timezone strings to ``tzlocal``, which depended on system timezone, pass the 'tz' keyword instead (:issue:`50791`)
- Enforced deprecation of passing a dictionary to :meth:`SeriesGroupBy.agg` (:issue:`52268`)
Expand Down
Loading