Skip to content

Commit e2a612a

Browse files
author
y-p
committed
Merge pull request #5993 from y-p/PR_doc_iso8601_fast_path
DOC: document read_csv fastpath for iso8601 datetime strings
2 parents 8c71949 + aaf5fb9 commit e2a612a

File tree

3 files changed

+20
-6
lines changed

3 files changed

+20
-6
lines changed

doc/source/io.rst

+16-6
Original file line numberDiff line numberDiff line change
@@ -456,12 +456,22 @@ data columns:
456456
index_col=0) #index is the nominal column
457457
df
458458
459-
**Note**: When passing a dict as the `parse_dates` argument, the order of
460-
the columns prepended is not guaranteed, because `dict` objects do not impose
461-
an ordering on their keys. On Python 2.7+ you may use `collections.OrderedDict`
462-
instead of a regular `dict` if this matters to you. Because of this, when using a
463-
dict for 'parse_dates' in conjunction with the `index_col` argument, it's best to
464-
specify `index_col` as a column label rather then as an index on the resulting frame.
459+
.. note::
460+
read_csv has a fast_path for parsing datetime strings in iso8601 format,
461+
e.g "2000-01-01T00:01:02+00:00" and similar variations. If you can arrange
462+
for your data to store datetimes in this format, load times will be
463+
significantly faster, ~20x has been observed.
464+
465+
466+
.. note::
467+
468+
When passing a dict as the `parse_dates` argument, the order of
469+
the columns prepended is not guaranteed, because `dict` objects do not impose
470+
an ordering on their keys. On Python 2.7+ you may use `collections.OrderedDict`
471+
instead of a regular `dict` if this matters to you. Because of this, when using a
472+
dict for 'parse_dates' in conjunction with the `index_col` argument, it's best to
473+
specify `index_col` as a column label rather then as an index on the resulting frame.
474+
465475

466476
Date Parsing Functions
467477
~~~~~~~~~~~~~~~~~~~~~~

doc/source/release.rst

+3
Original file line numberDiff line numberDiff line change
@@ -1596,6 +1596,9 @@ Improvements to existing features
15961596
- Add methods ``neg`` and ``inv`` to Series
15971597
- Implement ``kind`` option in ``ExcelFile`` to indicate whether it's an XLS
15981598
or XLSX file (:issue:`2613`)
1599+
- Documented a fast-path in pd.read_Csv when parsing iso8601 datetime strings
1600+
yielding as much as a 20x speedup. (:issue:`5993`)
1601+
15991602

16001603
Bug Fixes
16011604
~~~~~~~~~

pandas/io/parsers.py

+1
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@
8787
If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
8888
If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
8989
{'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo'
90+
A fast-path exists for iso8601-formatted dates.
9091
keep_date_col : boolean, default False
9192
If True and parse_dates specifies combining multiple columns then
9293
keep the original columns.

0 commit comments

Comments
 (0)