Skip to content

Commit d67bd35

Browse files
author
MarcoGorelli
committed
📝 update docs
1 parent dadb44b commit d67bd35

File tree

4 files changed

+49
-44
lines changed

4 files changed

+49
-44
lines changed

doc/source/user_guide/basics.rst

+2
Original file line numberDiff line numberDiff line change
@@ -2313,6 +2313,7 @@ useful if you are reading in data which is mostly of the desired dtype (e.g. num
23132313
non-conforming elements intermixed that you want to represent as missing:
23142314

23152315
.. ipython:: python
2316+
:okwarning:
23162317
23172318
import datetime
23182319
@@ -2329,6 +2330,7 @@ The ``errors`` parameter has a third option of ``errors='ignore'``, which will s
23292330
encounters any errors with the conversion to a desired data type:
23302331

23312332
.. ipython:: python
2333+
:okwarning:
23322334
23332335
import datetime
23342336

doc/source/user_guide/io.rst

+6-25
Original file line numberDiff line numberDiff line change
@@ -1009,41 +1009,22 @@ To parse the mixed-timezone values as a datetime column, pass a partially-applie
10091009
Inferring datetime format
10101010
+++++++++++++++++++++++++
10111011

1012-
If you have ``parse_dates`` enabled for some or all of your columns, and your
1013-
datetime strings are all formatted the same way, you may get a large speed
1014-
up by setting ``infer_datetime_format=True``. If set, pandas will attempt
1015-
to guess the format of your datetime strings, and then use a faster means
1016-
of parsing the strings. 5-10x parsing speeds have been observed. pandas
1017-
will fallback to the usual parsing if either the format cannot be guessed
1018-
or the format that was guessed cannot properly parse the entire column
1019-
of strings. So in general, ``infer_datetime_format`` should not have any
1020-
negative consequences if enabled.
1021-
1022-
Here are some examples of datetime strings that can be guessed (All
1023-
representing December 30th, 2011 at 00:00:00):
1024-
1025-
* "20111230"
1026-
* "2011/12/30"
1027-
* "20111230 00:00:00"
1028-
* "12/30/2011 00:00:00"
1029-
* "30/Dec/2011 00:00:00"
1030-
* "30/December/2011 00:00:00"
1031-
1032-
Note that ``infer_datetime_format`` is sensitive to ``dayfirst``. With
1033-
``dayfirst=True``, it will guess "01/12/2011" to be December 1st. With
1034-
``dayfirst=False`` (default) it will guess "01/12/2011" to be January 12th.
1012+
If you try to parse a column of date strings, pandas will attempt to guess the format
1013+
from the first non-NaN element, and will then parse the rest of the column with that
1014+
format.
10351015

10361016
.. ipython:: python
10371017
1038-
# Try to infer the format for the index column
10391018
df = pd.read_csv(
10401019
"foo.csv",
10411020
index_col=0,
10421021
parse_dates=True,
1043-
infer_datetime_format=True,
10441022
)
10451023
df
10461024
1025+
In the case that you have mixed datetime formats within the same column, you'll need to
1026+
first read it in the file, and then apply :func:`to_datetime` to each element.
1027+
10471028
.. ipython:: python
10481029
:suppress:
10491030

doc/source/user_guide/timeseries.rst

+8-19
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,6 @@ a tremendous amount of new functionality for manipulating time series data.
1313

1414
For example, pandas supports:
1515

16-
Parsing time series information from various sources and formats
17-
18-
.. ipython:: python
19-
20-
import datetime
21-
22-
dti = pd.to_datetime(
23-
["1/1/2018", np.datetime64("2018-01-01"), datetime.datetime(2018, 1, 1)]
24-
)
25-
dti
26-
2716
Generate sequences of fixed-frequency dates and time spans
2817

2918
.. ipython:: python
@@ -132,6 +121,8 @@ time.
132121

133122
.. ipython:: python
134123
124+
import datetime
125+
135126
pd.Timestamp(datetime.datetime(2012, 5, 1))
136127
pd.Timestamp("2012-05-01")
137128
pd.Timestamp(2012, 5, 1)
@@ -196,26 +187,24 @@ is converted to a ``DatetimeIndex``:
196187

197188
.. ipython:: python
198189
199-
pd.to_datetime(pd.Series(["Jul 31, 2009", "2010-01-10", None]))
190+
pd.to_datetime(pd.Series(["Jul 31, 2009", "Jan 10, 2010", None]))
200191
201-
pd.to_datetime(["2005/11/23", "2010.12.31"])
192+
pd.to_datetime(["2005/11/23", "2010/12/31"])
202193
203194
If you use dates which start with the day first (i.e. European style),
204195
you can pass the ``dayfirst`` flag:
205196

206197
.. ipython:: python
207-
:okwarning:
198+
:okwarning:
208199
209200
pd.to_datetime(["04-01-2012 10:00"], dayfirst=True)
210-
211-
pd.to_datetime(["14-01-2012", "01-14-2012"], dayfirst=True)
201+
pd.to_datetime(["04-14-2012 10:00"], dayfirst=True)
212202
213203
.. warning::
214204

215205
You see in the above example that ``dayfirst`` isn't strict. If a date
216206
can't be parsed with the day being first it will be parsed as if
217-
``dayfirst`` were False, and in the case of parsing delimited date strings
218-
(e.g. ``31-12-2012``) then a warning will also be raised.
207+
``dayfirst`` were False and a warning will also be raised.
219208

220209
If you pass a single string to ``to_datetime``, it returns a single ``Timestamp``.
221210
``Timestamp`` can also accept string input, but it doesn't accept string parsing
@@ -768,7 +757,7 @@ partially matching dates:
768757
rng2 = pd.date_range("2011-01-01", "2012-01-01", freq="W")
769758
ts2 = pd.Series(np.random.randn(len(rng2)), index=rng2)
770759
771-
ts2.truncate(before="2011-11", after="2011-12")
760+
ts2.truncate(before="2011-11-01", after="2011-12-01")
772761
ts2["2011-11":"2011-12"]
773762
774763
Even complicated fancy indexing that breaks the ``DatetimeIndex`` frequency

doc/source/whatsnew/v2.0.0.rst

+33
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,39 @@ Optional libraries below the lowest tested version may still work, but are not c
114114

115115
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
116116

117+
Datetimes are now parsed with a consistent format
118+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
119+
120+
:func:`to_datetime` now parses dates with a consistent format, which is guessed from the first non-NA value
121+
(unless ``format`` is specified). Previously, it would've guessed the format for each element individually.
122+
123+
*Old behavior*:
124+
125+
.. code-block:: ipython
126+
127+
In [1]: ser = pd.Series(['13-01-2000', '12-01-2000'])
128+
In [2]: pd.to_datetime(ser)
129+
Out[2]:
130+
0 2000-01-13
131+
1 2000-12-01
132+
dtype: datetime64[ns]
133+
134+
*New behavior*:
135+
136+
.. ipython:: python
137+
:okwarning:
138+
139+
ser = pd.Series(['13-01-2000', '12-01-2000'])
140+
pd.to_datetime(ser)
141+
142+
Note that this affects :func:`read_csv` as well.
143+
144+
If you still need to parse dates with inconsistent formats, you'll need to apply :func:`to_datetime`
145+
to each element individually, e.g. ::
146+
147+
ser = pd.Series(['13-01-2000', '12 January 2000'])
148+
ser.apply(pd.to_datetime)
149+
117150
.. _whatsnew_200.api_breaking.other:
118151

119152
Other API changes

0 commit comments

Comments
 (0)