Skip to content

Commit 4f5099b

Browse files
robintwjreback
authored andcommitted
Improved docs for infer_datetime_format
Fixes pandas-dev#12152 Author: Robin Wilson <[email protected]> Closes pandas-dev#12606 from robintw/infer_datetime_format-docs and squashes the following commits: 4dbb8ec [Robin Wilson] Added example to docs 50311ce [Robin Wilson] Updated docs 19fd9d4 [Robin Wilson] Fix linter error d3ce9a1 [Robin Wilson] Improved docs infer_datetime_format
1 parent ed3d145 commit 4f5099b

File tree

2 files changed

+23
-6
lines changed

2 files changed

+23
-6
lines changed

pandas/io/parsers.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -132,8 +132,10 @@ class ParserWarning(Warning):
132132
133133
Note: A fast-path exists for iso8601-formatted dates.
134134
infer_datetime_format : boolean, default False
135-
If True and parse_dates is enabled for a column, attempt to infer
136-
the datetime format to speed up the processing
135+
If True and parse_dates is enabled, pandas will attempt to infer the format
136+
of the datetime strings in the columns, and if it can be inferred, switch
137+
to a faster method of parsing them. In some cases this can increase the
138+
parsing speed by ~5-10x.
137139
keep_date_col : boolean, default False
138140
If True and parse_dates specifies combining multiple columns then
139141
keep the original columns.

pandas/tseries/tools.py

+19-4
Original file line numberDiff line numberDiff line change
@@ -231,8 +231,10 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
231231
unit : unit of the arg (D,s,ms,us,ns) denote the unit in epoch
232232
(e.g. a unix timestamp), which is an integer/float number.
233233
infer_datetime_format : boolean, default False
234-
If no `format` is given, try to infer the format based on the first
235-
datetime string. Provides a large speed-up in many cases.
234+
If True and no `format` is given, attempt to infer the format of the
235+
datetime strings, and if it can be inferred, switch to a faster
236+
method of parsing them. In some cases this can increase the parsing
237+
speed by ~5-10x.
236238
237239
Returns
238240
-------
@@ -264,15 +266,28 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
264266
265267
Or from strings
266268
267-
>>> df = df.astype(str)
268-
>>> pd.to_datetime(df.day + df.month + df.year, format="%d%m%Y")
269+
>>> dfs = df.astype(str)
270+
>>> pd.to_datetime(dfs.day + dfs.month + dfs.year, format="%d%m%Y")
269271
0 2000-01-01
270272
1 2000-01-02
271273
...
272274
98 2000-04-08
273275
99 2000-04-09
274276
Length: 100, dtype: datetime64[ns]
275277
278+
Infer the format from the first entry
279+
280+
>>> pd.to_datetime(dfs.month + '/' + dfs.day + '/' + dfs.year,
281+
infer_datetime_format=True)
282+
0 2000-01-01
283+
1 2000-01-02
284+
...
285+
98 2000-04-08
286+
99 2000-04-09
287+
288+
This gives the same results as omitting the `infer_datetime_format=True`,
289+
but is much faster.
290+
276291
Date that does not meet timestamp limitations:
277292
278293
>>> pd.to_datetime('13000101', format='%Y%m%d')

0 commit comments

Comments
 (0)