You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These will split the DataFrame on its index (rows). To split by columns, first do
151
+
The above GroupBy will split the DataFrame on its index (rows). To split by columns, first do
149
152
a tranpose:
150
153
151
154
.. ipython::
@@ -184,8 +187,8 @@ only verifies that you've passed a valid mapping.
184
187
.. note::
185
188
186
189
Many kinds of complicated data manipulations can be expressed in terms of
187
-
GroupBy operations (though can't be guaranteed to be the most
188
-
efficient). You can get quite creative with the label mapping functions.
190
+
GroupBy operations (though it can't be guaranteed to be the most efficient implementation).
191
+
You can get quite creative with the label mapping functions.
189
192
190
193
.. _groupby.sorting:
191
194
@@ -245,8 +248,8 @@ The default setting of ``dropna`` argument is ``True`` which means ``NA`` are no
245
248
GroupBy object attributes
246
249
~~~~~~~~~~~~~~~~~~~~~~~~~
247
250
248
-
The ``groups`` attribute is a dict whose keys are the computed unique groups
249
-
and corresponding values being the axis labels belonging to each group. In the
251
+
The ``groups`` attribute is a dictionary whose keys are the computed unique groups
252
+
and corresponding values are the axis labels belonging to each group. In the
250
253
above example we have:
251
254
252
255
.. ipython:: python
@@ -358,9 +361,10 @@ More on the ``sum`` function and aggregation later.
358
361
359
362
Grouping DataFrame with Index levels and columns
360
363
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
361
-
A DataFrame may be grouped by a combination of columns and index levels by
362
-
specifying the column names as strings and the index levels as ``pd.Grouper``
363
-
objects.
364
+
A DataFrame may be grouped by a combination of columns and index levels. You
365
+
can specify both column and index names, or use a :class:`Grouper`.
366
+
367
+
Let's first create a DataFrame with a MultiIndex:
364
368
365
369
.. ipython:: python
366
370
@@ -375,8 +379,7 @@ objects.
375
379
376
380
df
377
381
378
-
The following example groups ``df`` by the ``second`` index level and
379
-
the ``A`` column.
382
+
Then we group ``df`` by the ``second`` index level and the ``A`` column.
380
383
381
384
.. ipython:: python
382
385
@@ -398,8 +401,8 @@ DataFrame column selection in GroupBy
398
401
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
399
402
400
403
Once you have created the GroupBy object from a DataFrame, you might want to do
401
-
something different for each of the columns. Thus, using ``[]`` similar to
402
-
getting a column from a DataFrame, you can do:
404
+
something different for each of the columns. Thus, by using ``[]`` on the GroupBy
405
+
object in a similar way as the one used to get a column from a DataFrame, you can do:
403
406
404
407
.. ipython:: python
405
408
@@ -418,13 +421,13 @@ getting a column from a DataFrame, you can do:
418
421
grouped_C = grouped["C"]
419
422
grouped_D = grouped["D"]
420
423
421
-
This is mainly syntactic sugar for the alternative and much more verbose:
424
+
This is mainly syntactic sugar for the alternative, which is much more verbose:
422
425
423
426
.. ipython:: python
424
427
425
428
df["C"].groupby(df["A"])
426
429
427
-
Additionally this method avoids recomputing the internal grouping information
430
+
Additionally, this method avoids recomputing the internal grouping information
428
431
derived from the passed key.
429
432
430
433
.. _groupby.iterating-label:
@@ -1218,7 +1221,7 @@ The dimension of the returned result can also change:
1218
1221
1219
1222
grouped.apply(f)
1220
1223
1221
-
``apply`` on a Series can operate on a returned value from the applied function,
1224
+
``apply`` on a Series can operate on a returned value from the applied function
1222
1225
that is itself a series, and possibly upcast the result to a DataFrame:
1223
1226
1224
1227
.. ipython:: python
@@ -1303,18 +1306,10 @@ column ``B`` because it is not numeric. We refer to these non-numeric columns as
1303
1306
df.groupby("A").std(numeric_only=True)
1304
1307
1305
1308
Note that ``df.groupby('A').colname.std().`` is more efficient than
1306
-
``df.groupby('A').std().colname``, so if the result of an aggregation function
1307
-
is only interesting over one column (here ``colname``), it may be filtered
1309
+
``df.groupby('A').std().colname``. So if the result of an aggregation function
1310
+
is only needed over one column (here ``colname``), it may be filtered
1308
1311
*before* applying the aggregation function.
1309
1312
1310
-
.. note::
1311
-
Any object column, also if it contains numerical values such as ``Decimal``
1312
-
objects, is considered as a "nuisance" column. They are excluded from
1313
-
aggregate functions automatically in groupby.
1314
-
1315
-
If you do wish to include decimal or object columns in an aggregation with
1316
-
other non-nuisance data types, you must do so explicitly.
1317
-
1318
1313
.. ipython:: python
1319
1314
1320
1315
from decimal import Decimal
@@ -1573,9 +1568,9 @@ order they are first observed.
1573
1568
Plotting
1574
1569
~~~~~~~~
1575
1570
1576
-
Groupby also works with some plotting methods. For example, suppose we
1577
-
suspect that some features in a DataFrame may differ by group, in this case,
1578
-
the values in column 1 where the group is "B" are 3 higher on average.
1571
+
Groupby also works with some plotting methods. In this case, suppose we
1572
+
suspect that the values in column 1 are 3 times higher on average in group "B".
1573
+
1579
1574
1580
1575
.. ipython:: python
1581
1576
@@ -1657,7 +1652,7 @@ arbitrary function, for example:
1657
1652
1658
1653
df.groupby(["Store", "Product"]).pipe(mean)
1659
1654
1660
-
where ``mean`` takes a GroupBy object and finds the mean of the Revenue and Quantity
1655
+
Here ``mean`` takes a GroupBy object and finds the mean of the Revenue and Quantity
1661
1656
columns respectively for each Store-Product combination. The ``mean`` function can
1662
1657
be any function that takes in a GroupBy object; the ``.pipe`` will pass the GroupBy
1663
1658
object as a parameter into the function you specify.
@@ -1709,11 +1704,16 @@ Groupby by indexer to 'resample' data
1709
1704
1710
1705
Resampling produces new hypothetical samples (resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples.
1711
1706
1712
-
In order to resample to work on indices that are non-datetimelike, the following procedure can be utilized.
1707
+
In order for resample to work on indices that are non-datetimelike, the following procedure can be utilized.
1713
1708
1714
1709
In the following examples, **df.index // 5** returns a binary array which is used to determine what gets selected for the groupby operation.
1715
1710
1716
-
.. note:: The below example shows how we can downsample by consolidation of samples into fewer samples. Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()** function, we aggregate the information contained in many samples into a small subset of values which is their standard deviation thereby reducing the number of samples.
1711
+
.. note::
1712
+
1713
+
The example below shows how we can downsample by consolidation of samples into fewer ones.
1714
+
Here by using **df.index // 5**, we are aggregating the samples in bins. By applying **std()**
1715
+
function, we aggregate the information contained in many samples into a small subset of values
1716
+
which is their standard deviation thereby reducing the number of samples.
1717
1717
1718
1718
.. ipython:: python
1719
1719
@@ -1727,7 +1727,7 @@ Returning a Series to propagate names
1727
1727
1728
1728
Group DataFrame columns, compute a set of metrics and return a named Series.
1729
1729
The Series name is used as the name for the column index. This is especially
1730
-
useful in conjunction with reshaping operations such as stacking in which the
1730
+
useful in conjunction with reshaping operations such as stacking, in which the
1731
1731
column index name will be used as the name of the inserted column:
0 commit comments