Merge pull request #5993 from y-p/PR_doc_iso8601_fast_path

y-p · y-p · commit e2a612a311ed · 2014-01-18T12:07:12.000-08:00
DOC: document read_csv fastpath for iso8601 datetime strings
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -456,12 +456,22 @@ data columns:
                     index_col=0) #index is the nominal column
    df
 
-**Note**: When passing a dict as the `parse_dates` argument, the order of
-the columns prepended is not guaranteed, because `dict` objects do not impose
-an ordering on their keys. On Python 2.7+ you may use `collections.OrderedDict`
-instead of a regular `dict` if this matters to you. Because of this, when using a
-dict for 'parse_dates' in conjunction with the `index_col` argument, it's best to
-specify `index_col` as a column label rather then as an index on the resulting frame.
+.. note::
+   read_csv has a fast_path for parsing datetime strings in iso8601 format,
+   e.g "2000-01-01T00:01:02+00:00" and similar variations. If you can arrange
+   for your data to store datetimes in this format, load times will be
+   significantly faster, ~20x has been observed.
+
+
+.. note::
+
+   When passing a dict as the `parse_dates` argument, the order of
+   the columns prepended is not guaranteed, because `dict` objects do not impose
+   an ordering on their keys. On Python 2.7+ you may use `collections.OrderedDict`
+   instead of a regular `dict` if this matters to you. Because of this, when using a
+   dict for 'parse_dates' in conjunction with the `index_col` argument, it's best to
+   specify `index_col` as a column label rather then as an index on the resulting frame.
+
 
 Date Parsing Functions
 ~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/source/release.rst b/doc/source/release.rst
@@ -1596,6 +1596,9 @@ Improvements to existing features
   - Add methods ``neg`` and ``inv`` to Series
   - Implement ``kind`` option in ``ExcelFile`` to indicate whether it's an XLS
     or XLSX file (:issue:`2613`)
+  - Documented a fast-path in pd.read_Csv when parsing iso8601 datetime strings
+    yielding as much as a 20x speedup.  (:issue:`5993`)
+
 
 Bug Fixes
 ~~~~~~~~~
diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py
@@ -87,6 +87,7 @@
     If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
     If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
     {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo'
+    A fast-path exists for iso8601-formatted dates.
 keep_date_col : boolean, default False
     If True and parse_dates specifies combining multiple columns then
     keep the original columns.