Skip to content

DOC/BUG: Fix documentation for infer_datetime_format #6073 #6078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 25, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 16 additions & 17 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -387,11 +387,6 @@ The simplest case is to just pass in ``parse_dates=True``:
# These are python datetime objects
df.index

.. ipython:: python
:suppress:

os.remove('foo.csv')

It is often the case that we may want to store date and time data separately,
or store various date fields separately. the ``parse_dates`` keyword can be
used to specify a combination of columns to parse the dates and/or times from.
Expand Down Expand Up @@ -503,36 +498,40 @@ a single date rather than the entire array.

Inferring Datetime Format
~~~~~~~~~~~~~~~~~~~~~~~~~
If you have `parse_dates` enabled for some or all of your columns, and your
If you have ``parse_dates`` enabled for some or all of your columns, and your
datetime strings are all formatted the same way, you may get a large speed
up by setting `infer_datetime_format=True`. If set, pandas will attempt
up by setting ``infer_datetime_format=True``. If set, pandas will attempt
to guess the format of your datetime strings, and then use a faster means
of parsing the strings. 5-10x parsing speeds have been observed. Pandas
will fallback to the usual parsing if either the format cannot be guessed
or the format that was guessed cannot properly parse the entire column
of strings. So in general, `infer_datetime_format` should not have any
of strings. So in general, ``infer_datetime_format`` should not have any
negative consequences if enabled.

Here are some examples of datetime strings that can be guessed (All
representing December 30th, 2011 at 00:00:00)

"20111230"
"2011/12/30"
"20111230 00:00:00"
"12/30/2011 00:00:00"
"30/Dec/2011 00:00:00"
"30/December/2011 00:00:00"
- "20111230"
- "2011/12/30"
- "20111230 00:00:00"
- "12/30/2011 00:00:00"
- "30/Dec/2011 00:00:00"
- "30/December/2011 00:00:00"

`infer_datetime_format` is sensitive to `dayfirst`. With `dayfirst=True`, it
will guess "01/12/2011" to be December 1st. With `dayfirst=False` (default)
it will guess "01/12/2011" to be January 12th.
``infer_datetime_format`` is sensitive to ``dayfirst``. With
``dayfirst=True``, it will guess "01/12/2011" to be December 1st. With
``dayfirst=False`` (default) it will guess "01/12/2011" to be January 12th.

.. ipython:: python

# Try to infer the format for the index column
df = pd.read_csv('foo.csv', index_col=0, parse_dates=True,
infer_datetime_format=True)

.. ipython:: python
:suppress:

os.remove('foo.csv')

International Date Formats
~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
14 changes: 7 additions & 7 deletions doc/source/v0.13.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -148,19 +148,19 @@ Enhancements
result
result.loc[:,:,'ItemA']

- Added optional `infer_datetime_format` to `read_csv`, `Series.from_csv` and
`DataFrame.read_csv` (:issue:`5490`)
- Added optional ``infer_datetime_format`` to ``read_csv``, ``Series.from_csv``
and ``DataFrame.read_csv`` (:issue:`5490`)

If `parse_dates` is enabled and this flag is set, pandas will attempt to
If ``parse_dates`` is enabled and this flag is set, pandas will attempt to
infer the format of the datetime strings in the columns, and if it can
be inferred, switch to a faster method of parsing them. In some cases
this can increase the parsing speed by ~5-10x.

.. ipython:: python
.. code-block:: python

# Try to infer the format for the index column
df = pd.read_csv('foo.csv', index_col=0, parse_dates=True,
infer_datetime_format=True)
# Try to infer the format for the index column
df = pd.read_csv('foo.csv', index_col=0, parse_dates=True,
infer_datetime_format=True)

Experimental
~~~~~~~~~~~~
Expand Down