Skip to content

Commit d50bbcc

Browse files
Chang Shewesm
Chang She
authored andcommitted
DOC: documentation for parsing date/time from multiple columns #1250
1 parent d94fa66 commit d50bbcc

File tree

1 file changed

+53
-15
lines changed

1 file changed

+53
-15
lines changed

doc/source/io.rst

Lines changed: 53 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,11 @@ The default for `read_csv` is to create a DataFrame with simple numbered rows:
120120
In the case of indexed data, you can pass the column number (or a list of
121121
column numbers, for a hierarchical index) you wish to use as the index.
122122

123+
The parsers make every attempt to "do the right thing" and not be very
124+
fragile. Type inference is a pretty big deal. So if a column can be coerced to
125+
integer dtype without altering the contents, it will do so. Any non-numeric
126+
columns will come through as object dtype as with the rest of pandas objects.
127+
123128
.. _io.parse_dates:
124129

125130
To better facilitate working with datetime data, :func:`~pandas.io.parsers.read_csv` and :func:`~pandas.io.parsers.read_table`
@@ -142,35 +147,68 @@ The simplest case is to just pass in ``parse_dates=True``:
142147
143148
os.remove('foo.csv')
144149
145-
You can specify a custom ``date_parser`` function:
150+
It is often the case that we may want to store date and time data separately,
151+
or store various date fields separately. the ``parse_dates`` keyword can be
152+
used to specify a combination of columns to parse the dates and/or times from.
153+
154+
You can specify a list of column lists to ``parse_dates``, the resulting date
155+
columns will be prepended to the output and the new column names will be the
156+
concatenation of the component column names:
146157

147158
.. ipython:: python
148159
:suppress:
149-
# data = """
160+
161+
data = ("KORD,19990127, 19:00:00, 18:56:00, 0.8100\n"
162+
"KORD,19990127, 20:00:00, 19:56:00, 0.0100\n"
163+
"KORD,19990127, 21:00:00, 20:56:00, -0.5900\n"
164+
"KORD,19990127, 21:00:00, 21:18:00, -0.9900\n"
165+
"KORD,19990127, 22:00:00, 21:56:00, -0.5900\n"
166+
"KORD,19990127, 23:00:00, 22:56:00, -0.5900")
167+
150168
with open('tmp.csv', 'w') as fh:
151169
fh.write(data)
152170
153171
.. ipython:: python
154172
155-
# read it in
173+
print open('tmp.csv').read()
174+
df = read_csv('tmp.csv', header=None, parse_dates=[[1, 2], [1, 3]])
175+
df
176+
177+
By default the parser removes the component date columns, but you can choose
178+
to retain them via the ``keep_date_col`` keyword:
156179

157180
.. ipython:: python
158-
:suppress:
159-
os.remove('tmp.csv')
160181
161-
It is often the case that we may want to store date and time data separately,
162-
or store various date fields separately. the ``parse_dates`` keyword can be
163-
used to specify a combination of columns to parse the dates and/or times from.
182+
df = read_csv('tmp.csv', header=None, parse_dates=[[1, 2], [1, 3]],
183+
keep_date_col=True)
184+
df
164185
165-
You can specify a list of column lists to ``parse_dates``, the resulting date
166-
columns will be prepended to the output and the new column names will be the
167-
component column names
186+
You can also use a dict to specify custom name columns:
168187

188+
.. ipython:: python
169189
170-
The parsers make every attempt to "do the right thing" and not be very
171-
fragile. Type inference is a pretty big deal. So if a column can be coerced to
172-
integer dtype without altering the contents, it will do so. Any non-numeric
173-
columns will come through as object dtype as with the rest of pandas objects.
190+
date_spec = {'nominal': [1, 2], 'actual': [1, 3]}
191+
df = read_csv('tmp.csv', header=None, parse_dates=date_spec)
192+
df
193+
194+
Finally, the parser allows you can specify a custom ``date_parser`` function to
195+
take full advantage of the flexiblity of the date parsing API:
196+
197+
.. ipython:: python
198+
199+
import pandas.io.date_converters as conv
200+
df = read_cvs('tmp.csv', header=None, parse_dates=date_spec,
201+
date_parser=conv.parse_date_time)
202+
df
203+
204+
You can explore the date parsing functionality in ``date_converters.py`` and
205+
add your own. We would love to turn this module into a community supported set
206+
of date/time parsers.
207+
208+
.. ipython:: python
209+
:suppress:
210+
211+
os.remove('tmp.csv')
174212
175213
.. _io.fwf:
176214

0 commit comments

Comments
 (0)