DOC: documentation for parsing date/time from multiple columns #1250

Chang She · wesm · commit d50bbcc78a4e · 2012-05-21T14:24:45.000-04:00
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -120,6 +120,11 @@ The default for `read_csv` is to create a DataFrame with simple numbered rows:
 In the case of indexed data, you can pass the column number (or a list of
 column numbers, for a hierarchical index) you wish to use as the index.
 
+The parsers make every attempt to "do the right thing" and not be very
+fragile. Type inference is a pretty big deal. So if a column can be coerced to
+integer dtype without altering the contents, it will do so. Any non-numeric
+columns will come through as object dtype as with the rest of pandas objects.
+
 .. _io.parse_dates:
 
 To better facilitate working with datetime data, :func:`~pandas.io.parsers.read_csv` and :func:`~pandas.io.parsers.read_table`
@@ -142,35 +147,68 @@ The simplest case is to just pass in ``parse_dates=True``:
 
    os.remove('foo.csv')
 
-You can specify a custom ``date_parser`` function:
+It is often the case that we may want to store date and time data separately,
+or store various date fields separately. the ``parse_dates`` keyword can be
+used to specify a combination of columns to parse the dates and/or times from.
+
+You can specify a list of column lists to ``parse_dates``, the resulting date
+columns will be prepended to the output and the new column names will be the
+concatenation of the component column names:
 
 .. ipython:: python
    :suppress:
-   # data = """
+
+   data =  ("KORD,19990127, 19:00:00, 18:56:00, 0.8100\n"
+            "KORD,19990127, 20:00:00, 19:56:00, 0.0100\n"
+            "KORD,19990127, 21:00:00, 20:56:00, -0.5900\n"
+            "KORD,19990127, 21:00:00, 21:18:00, -0.9900\n"
+            "KORD,19990127, 22:00:00, 21:56:00, -0.5900\n"
+            "KORD,19990127, 23:00:00, 22:56:00, -0.5900")
+
    with open('tmp.csv', 'w') as fh:
        fh.write(data)
 
 .. ipython:: python
 
-   # read it in
+    print open('tmp.csv').read()
+    df = read_csv('tmp.csv', header=None, parse_dates=[[1, 2], [1, 3]])
+    df
+
+By default the parser removes the component date columns, but you can choose
+to retain them via the ``keep_date_col`` keyword:
 
 .. ipython:: python
-   :suppress:
-   os.remove('tmp.csv')
 
-It is often the case that we may want to store date and time data separately,
-or store various date fields separately. the ``parse_dates`` keyword can be
-used to specify a combination of columns to parse the dates and/or times from.
+   df = read_csv('tmp.csv', header=None, parse_dates=[[1, 2], [1, 3]],
+                 keep_date_col=True)
+   df
 
-You can specify a list of column lists to ``parse_dates``, the resulting date
-columns will be prepended to the output and the new column names will be the
-component column names
+You can also use a dict to specify custom name columns:
 
+.. ipython:: python
 
-The parsers make every attempt to "do the right thing" and not be very
-fragile. Type inference is a pretty big deal. So if a column can be coerced to
-integer dtype without altering the contents, it will do so. Any non-numeric
-columns will come through as object dtype as with the rest of pandas objects.
+   date_spec = {'nominal': [1, 2], 'actual': [1, 3]}
+   df = read_csv('tmp.csv', header=None, parse_dates=date_spec)
+   df
+
+Finally, the parser allows you can specify a custom ``date_parser`` function to
+take full advantage of the flexiblity of the date parsing API:
+
+.. ipython:: python
+
+   import pandas.io.date_converters as conv
+   df = read_cvs('tmp.csv', header=None, parse_dates=date_spec,
+                 date_parser=conv.parse_date_time)
+   df
+
+You can explore the date parsing functionality in ``date_converters.py`` and
+add your own. We would love to turn this module into a community supported set
+of date/time parsers.
+
+.. ipython:: python
+   :suppress:
+
+   os.remove('tmp.csv')
 
 .. _io.fwf: