From 8df3bdf02dabedb0e517fe4a49375c2df2507db8 Mon Sep 17 00:00:00 2001 From: Amol Date: Wed, 25 May 2016 09:28:59 +0530 Subject: [PATCH 1/7] DOC: Added additional example for groupby by indexer. --- doc/source/groupby.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index 4cde1fed344a8..702bfb7b6abbb 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -1014,6 +1014,13 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on df df.groupby(df.sum(), axis=1).sum() +Groupby by Indexer to 'resample' data. + +.. ipython:: python + + df = pd.DataFrame(np.random.randn(10,2)) + df + df.groupby(df.index / 5).std() Returning a Series to propagate names ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 55c88286d180c616ea3ba5dbb4a72cb5dd5c04b0 Mon Sep 17 00:00:00 2001 From: Amol Date: Wed, 25 May 2016 21:07:53 +0530 Subject: [PATCH 2/7] DOC: Increased documentation --- doc/source/groupby.rst | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index 702bfb7b6abbb..d66fff52b5e93 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -1016,12 +1016,32 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on Groupby by Indexer to 'resample' data. +Resampling produces new hypothetical samples(resamples) from already existing observed data or from a data generating mechanism which resemble the underlying population. + +In order to resample to work on indices that are non-datetimelike , the following procedure can be utilized. + +In the following examples, **df.index / 5** returns a binary array which is used to determine what get's selected for the groupby operation. + +.. note:: The above example shows how we can downsample. Downsampling refers to the throwing away of samples. Here by using **df.index / 5**, we are throwing away half the samples. + .. ipython:: python df = pd.DataFrame(np.random.randn(10,2)) df df.groupby(df.index / 5).std() +.. note:: For upsampling, we can again utilize a similar technique. Upsampling inserts values between the original samples. However, we change the indexes to a spaced out interval so that the new samples can fill those vacant indexes. Observe how the indexes of **df_down** are spaced out. + +.. ipython:: python + + df = pd.DataFrame(np.random.randn(10,2)) + df + s = (df.index.to_series() / 5).astype(int) + df_down = df.groupby(df.index / 5).std().set_index(s.index[4::5]) + df_down + df_up = df_down.reindex(range(10)).bfill() + df_up + Returning a Series to propagate names ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 03d87997cd29ff85a1bcd4d0bf70a1d6b2a2ccad Mon Sep 17 00:00:00 2001 From: Amol Date: Wed, 25 May 2016 23:55:58 +0530 Subject: [PATCH 3/7] DOC: Added more details --- doc/source/groupby.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index d66fff52b5e93..abaa4bb1d6a01 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -1022,12 +1022,13 @@ In order to resample to work on indices that are non-datetimelike , the followin In the following examples, **df.index / 5** returns a binary array which is used to determine what get's selected for the groupby operation. -.. note:: The above example shows how we can downsample. Downsampling refers to the throwing away of samples. Here by using **df.index / 5**, we are throwing away half the samples. +.. note:: The above example shows how we can downsample. Downsampling refers to the throwing away of samples. Here by using **df.index / 5**, we are throwing away half the samples. Here we also aggregate samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation. Hence we can reduce the number of samples by creating bins by clubbing together samples. .. ipython:: python df = pd.DataFrame(np.random.randn(10,2)) df + df.index / 5 df.groupby(df.index / 5).std() .. note:: For upsampling, we can again utilize a similar technique. Upsampling inserts values between the original samples. However, we change the indexes to a spaced out interval so that the new samples can fill those vacant indexes. Observe how the indexes of **df_down** are spaced out. From 932017cd461db0bed87ed346b5cc2237bd5867af Mon Sep 17 00:00:00 2001 From: Amol Date: Thu, 26 May 2016 09:30:30 +0530 Subject: [PATCH 4/7] DOC: Cleaned up documentation --- doc/source/groupby.rst | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index abaa4bb1d6a01..eee905f4e038a 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -1031,18 +1031,6 @@ In the following examples, **df.index / 5** returns a binary array which is used df.index / 5 df.groupby(df.index / 5).std() -.. note:: For upsampling, we can again utilize a similar technique. Upsampling inserts values between the original samples. However, we change the indexes to a spaced out interval so that the new samples can fill those vacant indexes. Observe how the indexes of **df_down** are spaced out. - -.. ipython:: python - - df = pd.DataFrame(np.random.randn(10,2)) - df - s = (df.index.to_series() / 5).astype(int) - df_down = df.groupby(df.index / 5).std().set_index(s.index[4::5]) - df_down - df_up = df_down.reindex(range(10)).bfill() - df_up - Returning a Series to propagate names ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 35f1ae5d640816eda29e10371c6b9f138b9150cb Mon Sep 17 00:00:00 2001 From: Amol Date: Fri, 27 May 2016 13:56:21 +0530 Subject: [PATCH 5/7] DOC: Cleaning up --- doc/source/groupby.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index eee905f4e038a..f867668ac9fdb 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -1022,7 +1022,7 @@ In order to resample to work on indices that are non-datetimelike , the followin In the following examples, **df.index / 5** returns a binary array which is used to determine what get's selected for the groupby operation. -.. note:: The above example shows how we can downsample. Downsampling refers to the throwing away of samples. Here by using **df.index / 5**, we are throwing away half the samples. Here we also aggregate samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation. Hence we can reduce the number of samples by creating bins by clubbing together samples. +.. note:: The above example shows how we can downsample. Downsampling refers to the throwing away of samples. Here by using **df.index / 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation. Hence we can reduce the number of samples by creating bins by clubbing together samples. .. ipython:: python From 93aa63faa51713f46a33c549a6b1255c1f59af67 Mon Sep 17 00:00:00 2001 From: Amol Date: Mon, 27 Jun 2016 20:09:24 +0530 Subject: [PATCH 6/7] DOC: Cleaned up documentation --- doc/source/groupby.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index f867668ac9fdb..d1760c08c0446 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -1016,20 +1016,20 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on Groupby by Indexer to 'resample' data. -Resampling produces new hypothetical samples(resamples) from already existing observed data or from a data generating mechanism which resemble the underlying population. +Resampling produces new hypothetical samples(resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples. In order to resample to work on indices that are non-datetimelike , the following procedure can be utilized. -In the following examples, **df.index / 5** returns a binary array which is used to determine what get's selected for the groupby operation. +In the following examples, **df.index // 5** returns a binary array which is used to determine what get's selected for the groupby operation. -.. note:: The above example shows how we can downsample. Downsampling refers to the throwing away of samples. Here by using **df.index / 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation. Hence we can reduce the number of samples by creating bins by clubbing together samples. +.. note:: The below example shows how we can downsample which is the throwing away of samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples. .. ipython:: python df = pd.DataFrame(np.random.randn(10,2)) df - df.index / 5 - df.groupby(df.index / 5).std() + df.index // 5 + df.groupby(df.index // 5).std() Returning a Series to propagate names ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 39b7fac3cdb5ce6c1f81823012c759ee2acd01b9 Mon Sep 17 00:00:00 2001 From: Amol Date: Tue, 28 Jun 2016 19:40:38 +0530 Subject: [PATCH 7/7] DOC: Improved readability --- doc/source/groupby.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index d1760c08c0446..484efd12c5d78 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -1014,7 +1014,8 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on df df.groupby(df.sum(), axis=1).sum() -Groupby by Indexer to 'resample' data. +Groupby by Indexer to 'resample' data +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Resampling produces new hypothetical samples(resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples. @@ -1022,7 +1023,7 @@ In order to resample to work on indices that are non-datetimelike , the followin In the following examples, **df.index // 5** returns a binary array which is used to determine what get's selected for the groupby operation. -.. note:: The below example shows how we can downsample which is the throwing away of samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples. +.. note:: The below example shows how we can downsample by consolidation of samples into fewer samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples. .. ipython:: python