Skip to content

Improved docs for infer_datetime_format #12606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,8 +132,10 @@ class ParserWarning(Warning):

Note: A fast-path exists for iso8601-formatted dates.
infer_datetime_format : boolean, default False
If True and parse_dates is enabled for a column, attempt to infer
the datetime format to speed up the processing
If True and parse_dates is enabled, pandas will attempt to infer the format
of the datetime strings in the columns, and if it can be inferred, switch
to a faster method of parsing them. In some cases this can increase the
parsing speed by ~5-10x.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put in a link to pd.to_datetime here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah, I forgot my own comment. If you want to add this in another PR great.

keep_date_col : boolean, default False
If True and parse_dates specifies combining multiple columns then
keep the original columns.
Expand Down
19 changes: 17 additions & 2 deletions pandas/tseries/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,8 +231,10 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
unit : unit of the arg (D,s,ms,us,ns) denote the unit in epoch
(e.g. a unix timestamp), which is an integer/float number.
infer_datetime_format : boolean, default False
If no `format` is given, try to infer the format based on the first
datetime string. Provides a large speed-up in many cases.
If True and no `format` is given, attempt to infer the format of the
datetime strings, and if it can be inferred, switch to a faster
method of parsing them. In some cases this can increase the parsing
speed by ~5-10x.

Returns
-------
Expand Down Expand Up @@ -273,6 +275,19 @@ def to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False,
99 2000-04-09
Length: 100, dtype: datetime64[ns]

Infer the format from the first entry

>>> pd.to_datetime(df.month + '/' + df.day + '/' + df.year,
infer_datetime_format=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not valid code (as df.month are integers), do something like:

s = df.month.astype(str) + '/' + df.day.astype(str) + '/' + df.year.astype(str)

then infer on that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that the df = df.astype(str) from the example above would still have applied.

Should each example be entirely separate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I see that. ok no, that is fine then (maybe rename df -> dfs or something (on the .astype(str) line)

0 2000-01-01
1 2000-01-02
...
98 2000-04-08
99 2000-04-09

This gives the same results as omitting the `infer_datetime_format=True`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is confusing, pls reword

but is much faster.

Date that does not meet timestamp limitations:

>>> pd.to_datetime('13000101', format='%Y%m%d')
Expand Down