From 08aa552b7f9cb98c4d3828a30b72167e73b65999 Mon Sep 17 00:00:00 2001 From: Matt Roeschke Date: Sat, 15 Sep 2018 22:58:49 -0700 Subject: [PATCH 1/3] CLN/DOC: Refactor timeseries.rst intro and overview --- doc/source/timeseries.rst | 115 ++++++++++++++++++++++++++------------ 1 file changed, 79 insertions(+), 36 deletions(-) diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst index 5dfac98d069e7..324a0ca0805eb 100644 --- a/doc/source/timeseries.rst +++ b/doc/source/timeseries.rst @@ -21,51 +21,58 @@ Time Series / Date functionality ******************************** -pandas has proven very successful as a tool for working with time series data, -especially in the financial data analysis space. Using the NumPy ``datetime64`` and ``timedelta64`` dtypes, -we have consolidated a large number of features from other Python libraries like ``scikits.timeseries`` as well as created +pandas contains extensive capabilities and features for working with time series data for all domains. +Using the NumPy ``datetime64`` and ``timedelta64`` dtypes, pandas has consolidated a large number of +features from other Python libraries like ``scikits.timeseries`` as well as created a tremendous amount of new functionality for manipulating time series data. -In working with time series data, we will frequently seek to: +For example, pandas supports: -* generate sequences of fixed-frequency dates and time spans -* conform or convert time series to a particular frequency -* compute "relative" dates based on various non-standard time increments - (e.g. 5 business days before the last business day of the year), or "roll" - dates forward or backward +Parsing time series information from various sources and formats -pandas provides a relatively compact and self-contained set of tools for -performing the above tasks. +.. ipython:: python + + dti = pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01'), datetime(2018, 1, 1)]) + dti -Create a range of dates: +Generate sequences of fixed-frequency dates and time spans .. ipython:: python - # 72 hours starting with midnight Jan 1st, 2011 - rng = pd.date_range('1/1/2011', periods=72, freq='H') - rng[:5] + dti = pd.date_range('2018-01-01', periods=3, freq='H') + dti -Index pandas objects with dates: +Manipulating and converting date times with timezone information .. ipython:: python - ts = pd.Series(np.random.randn(len(rng)), index=rng) - ts.head() + dti = dti.tz_localize('UTC') + dti + dti.tz_convert('US/Pacific') -Change frequency and fill gaps: +Resampling or converting a time series to a particular frequency .. ipython:: python - # to 45 minute frequency and forward fill - converted = ts.asfreq('45Min', method='pad') - converted.head() + idx = pd.date_range('2018-01-01', periods=72, freq='H') + ts = pd.Series(np.random.randn(len(idx)), index=idx) + ts.resample('D').mean() -Resample the series to a daily frequency: +Performing date and time arithmetic with absolute or relative time increments .. ipython:: python - # Daily means - ts.resample('D').mean() + ts = pd.Timestamp('2018-01-05') + ts.day_name() + # Add 1 day + saturday = ts + pd.Timedelta('1 day') + saturday.day_name() + # Add 1 business day (Friday --> Monday) + monday = ts + pd.tseries.offsets.BDay() + monday.day_name() + +pandas provides a relatively compact and self-contained set of tools for +performing the above tasks and more. .. _timeseries.overview: @@ -73,17 +80,53 @@ Resample the series to a daily frequency: Overview -------- -The following table shows the type of time-related classes pandas can handle and -how to create them. +pandas captures 4 general time related concepts: -================= =============================== =================================================================== -Class Remarks How to create -================= =============================== =================================================================== -``Timestamp`` Represents a single timestamp ``to_datetime``, ``Timestamp`` -``DatetimeIndex`` Index of ``Timestamp`` ``to_datetime``, ``date_range``, ``bdate_range``, ``DatetimeIndex`` -``Period`` Represents a single time span ``Period`` -``PeriodIndex`` Index of ``Period`` ``period_range``, ``PeriodIndex`` -================= =============================== =================================================================== +#. Date times: A specific date and time with timezone support. Similar to ``datetime.datetime`` from the standard library. +#. Time deltas: An absolute time duration. Similar to ``datetime.timedelta`` from the standard library. +#. Time spans: A span of time defined by a point in time and its associated frequency. +#. Date offsets: A relative time duration that respects calendar arithmetic. Similar to ``dateutil.relativedelta.relativedelta`` from the ``dateutil`` package. + +===================== ================= =================== ============================================ ======================================== +Concept Scalar Class Array Class pandas Data Type Primary Creation Method +===================== ================= =================== ============================================ ======================================== +Date times ``Timestamp`` ``DatetimeIndex`` ``datetime64[ns]`` or ``datetime64[ns, tz]`` ``to_datetime`` or ``date_range`` +Time deltas ``Timedelta`` ``TimedeltaIndex`` ``timedelta64[ns]`` ``to_timedelta`` or ``timedelta_range`` +Time spans ``Period`` ``PeriodIndex`` ``period[freq]`` ``Period`` or ``period_range`` +Date offsets ``DateOffset`` ``None`` ``None`` ``DateOffset`` +===================== ================= =================== ============================================ ======================================== + +For time series data, it's conventional to represent the time component in the index of a :class:`Series` or :class:`DataFrame` +so manipulations can be performed with respect to the time element. + +.. ipython:: python + + pd.Series(range(3), index=pd.date_range('2000', freq='D', periods=3)) + +However, :class:`Series` and :class:`DataFrame` can directly also support the time component as data itself. + +.. ipython:: python + + pd.Series(pd.date_range('2000', freq='D', periods=3)) + +:class:`Series` and :class:`DataFrame` have extended data type support and functionality for ``datetime`` and ``timedelta`` +data when the time data is used as data itself. The other time related concepts will be stored as ``object`` data. + +.. ipython:: python + + pd.Series(pd.period_range('1/1/2011', freq='M', periods=3)) + +Lastly, pandas represents null date times, time deltas, and time spans as ``NaT`` which +useful for representing missing or null date like values and behaves similar +as ``np.nan`` does for float data. + +.. ipython:: python + + pd.Timestamp(pd.NaT) + pd.Timedelta(pd.NaT) + pd.Period(pd.NaT) + # Equality acts as np.nan would + pd.NaT == pd.NaT .. _timeseries.representation: @@ -1443,7 +1486,7 @@ time. The method for this is :meth:`~Series.shift`, which is available on all of the pandas objects. .. ipython:: python - + ts = pd.Series(np.random.randn(len(rng)), index=rng) ts = ts[:5] ts.shift(1) From 51818277a611997172dc35df0be1d0442d1c9100 Mon Sep 17 00:00:00 2001 From: Matt Roeschke Date: Sun, 16 Sep 2018 08:57:43 -0700 Subject: [PATCH 2/3] Address review --- doc/source/timeseries.rst | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst index 324a0ca0805eb..69472a7cc9fe3 100644 --- a/doc/source/timeseries.rst +++ b/doc/source/timeseries.rst @@ -54,21 +54,22 @@ Resampling or converting a time series to a particular frequency .. ipython:: python - idx = pd.date_range('2018-01-01', periods=72, freq='H') - ts = pd.Series(np.random.randn(len(idx)), index=idx) - ts.resample('D').mean() + idx = pd.date_range('2018-01-01', periods=5, freq='H') + ts = pd.Series(range(len(idx)), index=idx) + ts + ts.resample('2H').mean() Performing date and time arithmetic with absolute or relative time increments .. ipython:: python - ts = pd.Timestamp('2018-01-05') - ts.day_name() + friday = pd.Timestamp('2018-01-05') + friday.day_name() # Add 1 day - saturday = ts + pd.Timedelta('1 day') + saturday = friday + pd.Timedelta('1 day') saturday.day_name() # Add 1 business day (Friday --> Monday) - monday = ts + pd.tseries.offsets.BDay() + monday = friday + pd.tseries.offsets.BDay() monday.day_name() pandas provides a relatively compact and self-contained set of tools for @@ -110,11 +111,12 @@ However, :class:`Series` and :class:`DataFrame` can directly also support the ti pd.Series(pd.date_range('2000', freq='D', periods=3)) :class:`Series` and :class:`DataFrame` have extended data type support and functionality for ``datetime`` and ``timedelta`` -data when the time data is used as data itself. The other time related concepts will be stored as ``object`` data. +data when the time data is used as data itself. The ``Period`` and ``DateOffset`` data will be stored as ``object`` data. .. ipython:: python pd.Series(pd.period_range('1/1/2011', freq='M', periods=3)) + pd.Series(pd.date_range('1/1/2011', freq='M', periods=3)) Lastly, pandas represents null date times, time deltas, and time spans as ``NaT`` which useful for representing missing or null date like values and behaves similar @@ -1486,7 +1488,7 @@ time. The method for this is :meth:`~Series.shift`, which is available on all of the pandas objects. .. ipython:: python - ts = pd.Series(np.random.randn(len(rng)), index=rng) + ts = pd.Series(range(len(rng)), index=rng) ts = ts[:5] ts.shift(1) From 6048638839edefe62dd5f32e9cb13ac6e6304894 Mon Sep 17 00:00:00 2001 From: Matt Roeschke Date: Sun, 16 Sep 2018 10:49:32 -0700 Subject: [PATCH 3/3] Forgot missing is --- doc/source/timeseries.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst index 69472a7cc9fe3..71bc064ffb0c2 100644 --- a/doc/source/timeseries.rst +++ b/doc/source/timeseries.rst @@ -119,7 +119,7 @@ data when the time data is used as data itself. The ``Period`` and ``DateOffset` pd.Series(pd.date_range('1/1/2011', freq='M', periods=3)) Lastly, pandas represents null date times, time deltas, and time spans as ``NaT`` which -useful for representing missing or null date like values and behaves similar +is useful for representing missing or null date like values and behaves similar as ``np.nan`` does for float data. .. ipython:: python