Skip to content

Commit 330f34f

Browse files
author
Chang She
committed
DOC: started on timeseries.rst for 0.8
1 parent c75c96e commit 330f34f

File tree

2 files changed

+139
-49
lines changed

2 files changed

+139
-49
lines changed

doc/source/computation.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -171,10 +171,10 @@ accept the following arguments:
171171
- ``window``: size of moving window
172172
- ``min_periods``: threshold of non-null data points to require (otherwise
173173
result is NA)
174-
- ``freq``: optionally specify a :ref: `frequency string <timeseries.freq>` or :ref:`DateOffset <timeseries.offsets>`
175-
to pre-conform the data to. Note that prior to pandas v0.8.0, a keyword
176-
argument ``time_rule`` was used instead of ``freq`` that referred to
177-
the legacy time rule constants
174+
- ``freq``: optionally specify a :ref: `frequency string <timeseries.alias>`
175+
or :ref:`DateOffset <timeseries.offsets>` to pre-conform the data to.
176+
Note that prior to pandas v0.8.0, a keyword argument ``time_rule`` was used
177+
instead of ``freq`` that referred to the legacy time rule constants
178178

179179
These functions can be applied to ndarrays or Series objects:
180180

doc/source/timeseries.rst

+135-45
Original file line numberDiff line numberDiff line change
@@ -4,28 +4,29 @@
44
.. ipython:: python
55
:suppress:
66
7+
from datetime import datetime
78
import numpy as np
89
np.random.seed(123456)
910
from pandas import *
1011
randn = np.random.randn
1112
np.set_printoptions(precision=4, suppress=True)
1213
from dateutil import relativedelta
13-
from pandas.core.datetools import *
14+
from pandas.tseries.api import *
1415
1516
********************************
1617
Time Series / Date functionality
1718
********************************
1819

1920
pandas has proven very successful as a tool for working with time series data,
20-
especially in the financial data analysis space. Over the coming year we will
21-
be looking to consolidate the various Python libraries for time series data,
22-
e.g. ``scikits.timeseries``, using the new NumPy ``datetime64`` dtype, to
23-
create a very nice integrated solution. Everything in pandas at the moment is
24-
based on using Python ``datetime`` objects.
21+
especially in the financial data analysis space. With the 0.8 release, we have
22+
further improved the time series API in pandas by leaps and bounds. Using the
23+
new NumPy ``datetime64`` dtype, we have consolidated a large number of features
24+
from other Python libraries like ``scikits.timeseries`` as well as created
25+
a tremendous amount of new functionality for manipulating time series data.
2526

2627
In working with time series data, we will frequently seek to:
2728

28-
- generate sequences of fixed-frequency dates
29+
- generate sequences of fixed-frequency dates and time spans
2930
- conform or convert time series to a particular frequency
3031
- compute "relative" dates based on various non-standard time increments
3132
(e.g. 5 business days before the last business day of the year), or "roll"
@@ -34,18 +35,85 @@ In working with time series data, we will frequently seek to:
3435
pandas provides a relatively compact and self-contained set of tools for
3536
performing the above tasks.
3637

37-
.. note::
38+
.. _timeseries.representation:
39+
40+
Time Stamps vs. Time Spans
41+
--------------------------
42+
43+
While most time series representations of data associates values with a time
44+
stamp, in many cases it is more natural to associate the values with a given
45+
time span. For example, it is easy to think of level variables at a
46+
particular point in time, but much more intuitive to think of change variables
47+
over spans of time. Starting with 0.8, pandas allows you to capture both
48+
representations and convert between them. Under the hood, pandas represents
49+
timestamps using instances of ``Timestamp`` and sequences of timestamps using
50+
instances of ``DatetimeIndex``. For regular time spans, pandas uses ``Period``
51+
objects for scalar values and ``PeriodIndex`` for sequences of spans.
52+
Better support for irregular intervals with arbitrary start and end points are
53+
forth-coming in future releases.
54+
55+
For example:
56+
57+
.. ipython:: python
58+
59+
# Time stamped data
60+
dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]
61+
ts = Series(np.random.randn(3), dates)
62+
63+
type(ts.index)
64+
65+
ts
66+
67+
# Time span data
68+
periods = PeriodIndex([Period('2012-01'), Period('2012-02'),
69+
Period('2012-03')])
70+
ts = Series(np.random.randn(3), periods)
71+
72+
type(ts.index)
73+
74+
ts
75+
76+
.. _timeseries.timestamprange:
77+
78+
Generating Ranges of Timestamps
79+
-------------------------------
80+
81+
To generate an index with time stamps, you can use either the DatetimeIndex or
82+
Index constructor and pass in a list of datetime objects:
3883

39-
This area of pandas has gotten less development attention recently, though
40-
this should change in the near future.
84+
.. ipython:: python
85+
86+
dates = [datetime(2012, 5, 1), datetime(2012, 5, 2), datetime(2012, 5, 3)]
87+
index = DatetimeIndex(dates)
88+
index # Note the frequency information
89+
90+
index = Index(dates)
91+
index # Automatically converted to DatetimeIndex
92+
93+
Practically, this becomes very cumbersome because we often need a very long
94+
index with a large number of timestamps. If we need timestamps on a regular
95+
frequency, we can use the pandas functions ``date_range`` and ``bdate_range``
96+
to create timestamp indexes.
97+
98+
.. ipython:: python
99+
100+
index = date_range('2000-1-1', periods=1000, freq='M')
101+
index
102+
103+
index = bdate_range('2012-1-1', periods=250)
104+
index
41105
42106
.. _timeseries.offsets:
43107

44108
DateOffset objects
45109
------------------
46110

47-
A ``DateOffset`` instance represents a frequency increment. Different offset
48-
logic via subclasses:
111+
In order to create the sequence of dates with a monthly frequency in the
112+
previous example, we used the ``freq`` keyword and gave it 'M' as the input.
113+
Under the hood, the string 'M' is being interpreted into an instance of pandas
114+
``DateOffset``. ``DateOffset`` represents a regular frequency increment.
115+
Specific offset logic like "business day" or "one hour" is represented in its
116+
various subclasses.
49117

50118
.. csv-table::
51119
:header: "Class name", "Description"
@@ -54,16 +122,24 @@ logic via subclasses:
54122
DateOffset, "Generic offset class, defaults to 1 calendar day"
55123
BDay, "business day (weekday)"
56124
Week, "one week, optionally anchored on a day of the week"
125+
WeekOfMonth, "the x-th day of the y-th week of each month"
57126
MonthEnd, "calendar month end"
127+
MonthBegin, "calendar month begin"
58128
BMonthEnd, "business month end"
129+
BMonthBegin, "business month begin"
59130
QuarterEnd, "calendar quarter end"
131+
QuarterBegin, "calendar quarter begin"
60132
BQuarterEnd, "business quarter end"
133+
BQuarterBegin, "business quarter begin"
61134
YearEnd, "calendar year end"
62135
YearBegin, "calendar year begin"
63136
BYearEnd, "business year end"
137+
BYearBegin, "business year begin"
64138
Hour, "one hour"
65139
Minute, "one minute"
66140
Second, "one second"
141+
Milli, "one millisecond"
142+
Micro, "one microsecond"
67143

68144
The basic ``DateOffset`` takes the same arguments as
69145
``dateutil.relativedelta``, which works like:
@@ -113,7 +189,7 @@ The ``rollforward`` and ``rollback`` methods do exactly what you would expect:
113189
offset.rollforward(d)
114190
offset.rollback(d)
115191
116-
It's definitely worth exploring the ``pandas.core.datetools`` module and the
192+
It's definitely worth exploring the ``pandas.tseries.offsets`` module and the
117193
various docstrings for the classes.
118194

119195
Parametric offsets
@@ -130,7 +206,14 @@ particular day of the week:
130206
d + Week(weekday=4)
131207
(d + Week(weekday=4)).weekday()
132208
133-
.. _timeseries.freq:
209+
Another example is parameterizing ``YearEnd`` with the specific ending month:
210+
211+
.. ipython:: python
212+
213+
d + YearEnd()
214+
d + YearEnd(month=6)
215+
216+
.. _timeseries.alias:
134217

135218
Offset Aliases
136219
~~~~~~~~~~~~~~
@@ -202,9 +285,9 @@ For some frequencies you can specify an anchoring suffix:
202285
"(B)A(S)\-OCT", "annual frequency, anchored end of October"
203286
"(B)A(S)\-NOV", "annual frequency, anchored end of November"
204287

205-
These can be used as arguments to ``date_range``, ``period_range``, constructors
206-
for ``PeriodIndex`` and ``DatetimeIndex``, as well as various other time
207-
series-related functions in pandas.
288+
These can be used as arguments to ``date_range``, ``bdate_range``, constructors
289+
for ``DatetimeIndex``, as well as various other timeseries-related functions
290+
in pandas.
208291

209292
Note that prior to v0.8.0, time rules had a slightly different look. Pandas
210293
will continue to support the legacy time rules for the time being but it is
@@ -242,56 +325,63 @@ strongly recommended that you switch to using the new offset aliases.
242325
"ms", "L"
243326
"us": "U"
244327

245-
Note that the legacy quarterly and annual frequencies are business quarter and
246-
business year ends. Also note the legacy time rule for milliseconds ``ms``
247-
versus the new offset alias for month start ``MS``. This means that offset
248-
alias parsing is case sensitive.
328+
As you can see, legacy quarterly and annual frequencies are business quarter
329+
and business year ends. Please also note the legacy time rule for milliseconds
330+
``ms`` versus the new offset alias for month start ``MS``. This means that
331+
offset alias parsing is case sensitive.
249332

250333
.. _timeseries.daterange:
251334

252-
Generating date ranges (date_range)
253-
-----------------------------------
335+
More on date ranges
336+
-------------------
254337

255-
The ``date_range`` class utilizes these offsets (and any ones that we might add)
256-
to generate fixed-frequency date ranges:
338+
Convenience functions like ``date_range`` and ``bdate_range`` utilizes the
339+
offsets described above to generate fixed-frequency date ranges. The default
340+
frequency for ``date_range`` is a **calendar day** while the default for
341+
``bdate_range`` is a **business day**
257342

258343
.. ipython:: python
259344
260345
start = datetime(2009, 1, 1)
261346
end = datetime(2010, 1, 1)
262347
263-
rng = date_range(start, end, freq=BDay())
348+
rng = date_range(start, end)
349+
rng
350+
351+
rng = bdate_range(start, end)
264352
rng
353+
354+
``date_range`` and ``bdate_range`` makes it easy to generate a range of dates
355+
using various combinations of its parameters like ``start``, ``end``,
356+
``periods``, and ``freq``:
357+
265358
date_range(start, end, freq=BMonthEnd())
266359

267-
**Business day frequency** is the default for ``date_range``. You can also
268-
strictly generate a ``date_range`` of a certain length by providing either a
269-
start or end date and a ``periods`` argument:
360+
date_range(start, end, freq=3 * Week())
270361

271-
.. ipython:: python
362+
bdate_range(end=end, periods=20)
272363

273-
date_range(start, periods=20)
274-
date_range(end=end, periods=20)
364+
bdate_range(start=start, periods=20)
275365

276366
The start and end dates are strictly inclusive. So it will not generate any
277367
dates outside of those dates if specified.
278368

279-
date_range is a valid Index
280-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
281369

282-
One of the main uses for ``date_range`` is as an index for pandas objects. When
283-
working with a lot of time series data, there are several reasons to use
284-
``date_range`` objects when possible:
370+
DatetimeIndex
371+
~~~~~~~~~~~~~
372+
373+
One of the main uses for ``DatetimeIndex`` is as an index for pandas objects.
374+
The ``DatetimeIndex`` class contains many timeseries related optimizations:
285375

286376
- A large range of dates for various offsets are pre-computed and cached
287377
under the hood in order to make generating subsequent date ranges very fast
288378
(just have to grab a slice)
289-
- Fast shifting using the ``shift`` method on pandas objects
290-
- Unioning of overlapping date_range objects with the same frequency is very
291-
fast (important for fast data alignment)
379+
- Fast shifting using the ``shift`` and ``tshift`` method on pandas objects
380+
- Unioning of overlapping DatetimeIndex objects with the same frequency is
381+
very fast (important for fast data alignment)
292382

293-
The ``date_range`` is a valid index and can even be intelligent when doing
294-
slicing, etc.
383+
``DatetimeIndex`` can be used like a regular index and offers all of its
384+
intelligent functionality like selection, slicing, etc.
295385

296386
.. ipython:: python
297387
@@ -301,8 +391,8 @@ slicing, etc.
301391
ts[:5].index
302392
ts[::2].index
303393
304-
More complicated fancy indexing will result in an ``Index`` that is no longer a
305-
``date_range``, however:
394+
However, complicated fancy indexing that breaks the DatetimeIndex's frequency
395+
regularity will result in an ``Index`` that is no longer a ``DatetimeIndex``:
306396

307397
.. ipython:: python
308398
@@ -335,7 +425,7 @@ and in Panel along the ``major_axis``.
335425
336426
The shift method accepts an ``offset`` argument which can accept a
337427
``DateOffset`` class or other ``timedelta``-like object or also a :ref:`time
338-
rule <timeseries.timerule>`:
428+
rule <timeseries.alias>`:
339429

340430
.. ipython:: python
341431

0 commit comments

Comments
 (0)