DOC: Added additional example for groupby by indexer. #13276

pfrcks · 2016-05-25T04:01:05Z

closes DOC: groupby by indexer to 'resample' data #13271
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

jreback · 2016-05-25T12:40:27Z

doc/source/groupby.rst

@@ -1014,6 +1014,13 @@ Regroup columns of a DataFrame according to their sum, and sum the aggregated on
   df
   df.groupby(df.sum(), axis=1).sum()

+Groupby by Indexer to 'resample' data.


This needs a fair bit more explanation of the why and how this does what it does. Maybe show the intent of a resample, then show how one can go about the same idea using non-datetimelike indices.

need a markdown line here like the other examples

Can you underline this with ~~~~ to make this a header (see a few lines above this for an example)

I think you forgot this one

pfrcks · 2016-05-25T15:39:35Z

Added more documentation and examples to clarify resampling as whole.

jreback · 2016-05-25T15:44:10Z

doc/source/groupby.rst

+
+In order to resample to work on indices that are non-datetimelike , the following procedure can be utilized.
+
+In the following examples, **df.index / 5** returns a binary array which is used to determine what get's selected for the groupby operation.


you can show this (add another ipython block), df.index / 5

jreback · 2016-05-27T00:16:52Z

doc/source/groupby.rst

+
+In the following examples, **df.index / 5** returns a binary array which is used to determine what get's selected for the groupby operation.
+
+.. note:: The above example shows how we can downsample. Downsampling refers to the throwing away of samples. Here by using **df.index / 5**, we are throwing away half the samples. Here we also aggregate samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation. Hence we can reduce the number of samples by creating bins by clubbing together samples.


we don't throw away samples at all. We aggregate them.

jorisvandenbossche · 2016-06-24T13:02:39Z

doc/source/groupby.rst

+
+   df = pd.DataFrame(np.random.randn(10,2))
+   df
+   df.index / 5


This will not work as desired in python 3. Can you make this df.index // 5 to do a floor division explicitly?

jorisvandenbossche · 2016-06-24T13:12:40Z

@pfrcks Added a few more comments.

pfrcks · 2016-06-27T14:40:14Z

@jorisvandenbossche kindly go through the changes and comment.

codecov-io · 2016-06-27T16:14:43Z

Current coverage is 84.33%

Merging #13276 into master will increase coverage by 0.15%

@@             master     #13276   diff @@
==========================================
  Files           138        138          
  Lines         50581      51107   +526   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42582      43103   +521   
- Misses         7999       8004     +5   
  Partials          0          0

Powered by Codecov. Last updated by e0a2e3b...39b7fac

jorisvandenbossche · 2016-06-27T18:40:14Z

doc/source/groupby.rst

+
+In the following examples, **df.index // 5** returns a binary array which is used to determine what get's selected for the groupby operation.
+
+.. note:: The below example shows how we can downsample which is  the throwing away of samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples.


I still don't like the "throwing away of samples". You don't throw them away, IMO you process them in some way (eg taking the mean of each group of samples)

pfrcks · 2016-06-28T14:13:03Z

@jorisvandenbossche Sorry, I overlooked that. Have made the necessary changes. Please look and comment.

jorisvandenbossche · 2016-06-28T22:18:10Z

@pfrcks Thanks!

DOC: Added additional example for groupby by indexer.

8df3bdf

jreback added the Docs label May 25, 2016

jreback reviewed May 25, 2016
View reviewed changes

DOC: Increased documentation

55c8828

jreback reviewed May 25, 2016
View reviewed changes

pfrcks added 2 commits May 25, 2016 23:55

DOC: Added more details

03d8799

DOC: Cleaned up documentation

932017c

jreback reviewed May 27, 2016
View reviewed changes

DOC: Cleaning up

35f1ae5

jorisvandenbossche reviewed Jun 24, 2016
View reviewed changes

DOC: Cleaned up documentation

93aa63f

jorisvandenbossche reviewed Jun 27, 2016
View reviewed changes

DOC: Improved readability

39b7fac

jorisvandenbossche merged commit 9e73c71 into pandas-dev:master Jun 28, 2016

jorisvandenbossche added this to the 0.18.2 milestone Jun 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Added additional example for groupby by indexer. #13276

DOC: Added additional example for groupby by indexer. #13276

pfrcks commented May 25, 2016 •

edited

Loading

jreback May 25, 2016

jreback May 25, 2016

jorisvandenbossche Jun 24, 2016

jorisvandenbossche Jun 27, 2016

pfrcks commented May 25, 2016

jreback May 25, 2016

jreback May 27, 2016

jorisvandenbossche Jun 24, 2016

jorisvandenbossche commented Jun 24, 2016

pfrcks commented Jun 27, 2016

codecov-io commented Jun 27, 2016 •

edited

Loading

jorisvandenbossche Jun 27, 2016

pfrcks commented Jun 28, 2016

jorisvandenbossche commented Jun 28, 2016


		In order to resample to work on indices that are non-datetimelike , the following procedure can be utilized.

		In the following examples, df.index / 5 returns a binary array which is used to determine what get's selected for the groupby operation.


		In the following examples, df.index / 5 returns a binary array which is used to determine what get's selected for the groupby operation.

		.. note:: The above example shows how we can downsample. Downsampling refers to the throwing away of samples. Here by using df.index / 5, we are throwing away half the samples. Here we also aggregate samples in bins. By applying std() function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation. Hence we can reduce the number of samples by creating bins by clubbing together samples.


		In the following examples, df.index // 5 returns a binary array which is used to determine what get's selected for the groupby operation.

		.. note:: The below example shows how we can downsample which is the throwing away of samples. Here by using df.index // 5, we are aggregating the samples in bins. By applying std() function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples.

DOC: Added additional example for groupby by indexer. #13276

DOC: Added additional example for groupby by indexer. #13276

Conversation

pfrcks commented May 25, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pfrcks commented May 25, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Jun 24, 2016

pfrcks commented Jun 27, 2016

codecov-io commented Jun 27, 2016 • edited Loading

Current coverage is 84.33%

Choose a reason for hiding this comment

pfrcks commented Jun 28, 2016

jorisvandenbossche commented Jun 28, 2016

pfrcks commented May 25, 2016 •

edited

Loading

codecov-io commented Jun 27, 2016 •

edited

Loading