@@ -702,7 +702,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.
702
702
703
703
1. `Tablewise Function Application `_: :meth: `~DataFrame.pipe `
704
704
2. `Row or Column-wise Function Application `_: :meth: `~DataFrame.apply `
705
- 3. Elementwise _ function application: :meth: `~DataFrame.applymap `
705
+ 3. `Aggregation API `_: :meth: `~DataFrame.agg ` and :meth: `~DataFrame.transform `
706
+ 4. `Applying Elementwise Functions `_: :meth: `~DataFrame.applymap `
706
707
707
708
.. _basics.pipe :
708
709
@@ -778,6 +779,13 @@ statistics methods, take an optional ``axis`` argument:
778
779
df.apply(np.cumsum)
779
780
df.apply(np.exp)
780
781
782
+ ``.apply() `` will also dispatch on a string method name.
783
+
784
+ .. ipython :: python
785
+
786
+ df.apply(' mean' )
787
+ df.apply(' mean' , axis = 1 )
788
+
781
789
Depending on the return type of the function passed to :meth: `~DataFrame.apply `,
782
790
the result will either be of lower dimension or the same dimension.
783
791
@@ -827,16 +835,223 @@ set to True, the passed function will instead receive an ndarray object, which
827
835
has positive performance implications if you do not need the indexing
828
836
functionality.
829
837
830
- .. seealso ::
838
+ .. _basics.aggregate :
839
+
840
+ Aggregation API
841
+ ~~~~~~~~~~~~~~~
842
+
843
+ .. versionadded :: 0.20.0
844
+
845
+ The aggregation API allows one to express possibly multiple aggregation operations in a single concise way.
846
+ This API is similar across pandas objects, see :ref: `groupby API <groupby.aggregate >`, the
847
+ :ref: `window functions API <stats.aggregate >`, and the :ref: `resample API <timeseries.aggregate >`.
848
+ The entry point for aggregation is the method :meth: `~DataFrame.aggregate `, or the alias :meth: `~DataFrame.agg `.
849
+
850
+ We will use a similar starting frame from above:
851
+
852
+ .. ipython :: python
853
+
854
+ tsdf = pd.DataFrame(np.random.randn(10 , 3 ), columns = [' A' , ' B' , ' C' ],
855
+ index = pd.date_range(' 1/1/2000' , periods = 10 ))
856
+ tsdf.iloc[3 :7 ] = np.nan
857
+ tsdf
858
+
859
+ Using a single function is equivalent to :meth: `~DataFrame.apply `; You can also pass named methods as strings.
860
+ These will return a ``Series `` of the aggregated output:
861
+
862
+ .. ipython :: python
863
+
864
+ tsdf.agg(np.sum)
865
+
866
+ tsdf.agg(' sum' )
867
+
868
+ # these are equivalent to a ``.sum()`` because we are aggregating on a single function
869
+ tsdf.sum()
870
+
871
+ Single aggregations on a ``Series `` this will result in a scalar value:
872
+
873
+ .. ipython :: python
874
+
875
+ tsdf.A.agg(' sum' )
876
+
877
+
878
+ Aggregating with multiple functions
879
+ +++++++++++++++++++++++++++++++++++
880
+
881
+ You can pass multiple aggregation arguments as a list.
882
+ The results of each of the passed functions will be a row in the resultant ``DataFrame ``.
883
+ These are naturally named from the aggregation function.
884
+
885
+ .. ipython :: python
886
+
887
+ tsdf.agg([' sum' ])
888
+
889
+ Multiple functions yield multiple rows:
890
+
891
+ .. ipython :: python
892
+
893
+ tsdf.agg([' sum' , ' mean' ])
894
+
895
+ On a ``Series ``, multiple functions return a ``Series ``, indexed by the function names:
896
+
897
+ .. ipython :: python
898
+
899
+ tsdf.A.agg([' sum' , ' mean' ])
900
+
901
+ Passing a ``lambda `` function will yield a ``<lambda> `` named row:
902
+
903
+ .. ipython :: python
904
+
905
+ tsdf.A.agg([' sum' , lambda x : x.mean()])
906
+
907
+ Passing a named function will yield that name for the row:
908
+
909
+ .. ipython :: python
910
+
911
+ def mymean (x ):
912
+ return x.mean()
913
+
914
+ tsdf.A.agg([' sum' , mymean])
915
+
916
+ Aggregating with a dict
917
+ +++++++++++++++++++++++
918
+
919
+ Passing a dictionary of column names to a scalar or a list of scalars, to ``DataFame.agg ``
920
+ allows you to customize which functions are applied to which columns.
921
+
922
+ .. ipython :: python
923
+
924
+ tsdf.agg({' A' : ' mean' , ' B' : ' sum' })
925
+
926
+ Passing a list-like will generate a ``DataFrame `` output. You will get a matrix-like output
927
+ of all of the aggregators. The output will consist of all unique functions. Those that are
928
+ not noted for a particular column will be ``NaN ``:
929
+
930
+ .. ipython :: python
931
+
932
+ tsdf.agg({' A' : [' mean' , ' min' ], ' B' : ' sum' })
933
+
934
+ .. _basics.aggregation.mixed_dtypes :
935
+
936
+ Mixed Dtypes
937
+ ++++++++++++
831
938
832
- The section on :ref: `GroupBy <groupby >` demonstrates related, flexible
833
- functionality for grouping by some criterion, applying, and combining the
834
- results into a Series, DataFrame, etc.
939
+ When presented with mixed dtypes that cannot aggregate, ``.agg `` will only take the valid
940
+ aggregations. This is similiar to how groupby ``.agg `` works.
835
941
836
- .. _Elementwise :
942
+ .. ipython :: python
943
+
944
+ mdf = pd.DataFrame({' A' : [1 , 2 , 3 ],
945
+ ' B' : [1 ., 2 ., 3 .],
946
+ ' C' : [' foo' , ' bar' , ' baz' ],
947
+ ' D' : pd.date_range(' 20130101' , periods = 3 )})
948
+ mdf.dtypes
949
+
950
+ .. ipython :: python
951
+
952
+ mdf.agg([' min' , ' sum' ])
953
+
954
+ .. _basics.aggregation.custom_describe :
955
+
956
+ Custom describe
957
+ +++++++++++++++
958
+
959
+ With ``.agg() `` is it possible to easily create a custom describe function, similar
960
+ to the built in :ref: `describe function <basics.describe >`.
961
+
962
+ .. ipython :: python
963
+
964
+ from functools import partial
965
+
966
+ q_25 = partial(pd.Series.quantile, q = 0.25 )
967
+ q_25.__name__ = ' 25%'
968
+ q_75 = partial(pd.Series.quantile, q = 0.75 )
969
+ q_75.__name__ = ' 75%'
970
+
971
+ tsdf.agg([' count' , ' mean' , ' std' , ' min' , q_25, ' median' , q_75, ' max' ])
972
+
973
+ .. _basics.transform :
974
+
975
+ Transform API
976
+ ~~~~~~~~~~~~~
837
977
838
- Applying elementwise Python functions
839
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
978
+ .. versionadded :: 0.20.0
979
+
980
+ The :method: `~DataFrame.transform ` method returns an object that is indexed the same (same size)
981
+ as the original. This API allows you to provide *multiple * operations at the same
982
+ time rather than one-by-one. Its api is quite similar to the ``.agg `` API.
983
+
984
+ Use a similar frame to the above sections.
985
+
986
+ .. ipython :: python
987
+
988
+ tsdf = pd.DataFrame(np.random.randn(10 , 3 ), columns = [' A' , ' B' , ' C' ],
989
+ index = pd.date_range(' 1/1/2000' , periods = 10 ))
990
+ tsdf.iloc[3 :7 ] = np.nan
991
+ tsdf
992
+
993
+ Transform the entire frame. Transform allows functions to input as a numpy function, string
994
+ function name and user defined function.
995
+
996
+ .. ipython :: python
997
+
998
+ tsdf.transform(np.abs)
999
+ tsdf.transform(' abs' )
1000
+ tsdf.transform(lambda x : x.abs())
1001
+
1002
+ Since this is a single function, this is equivalent to a ufunc application
1003
+
1004
+ .. ipython :: python
1005
+
1006
+ np.abs(tsdf)
1007
+
1008
+ Passing a single function to ``.transform() `` with a Series will yield a single Series in return.
1009
+
1010
+ .. ipython :: python
1011
+
1012
+ tsdf.A.transform(np.abs)
1013
+
1014
+
1015
+ Transform with multiple functions
1016
+ +++++++++++++++++++++++++++++++++
1017
+
1018
+ Passing multiple functions will yield a column multi-indexed DataFrame.
1019
+ The first level will be the original frame column names; the second level
1020
+ will be the names of the transforming functions.
1021
+
1022
+ .. ipython :: python
1023
+
1024
+ tsdf.transform([np.abs, lambda x : x+ 1 ])
1025
+
1026
+ Passing multiple functions to a Series will yield a DataFrame. The
1027
+ resulting column names will be the transforming functions.
1028
+
1029
+ .. ipython :: python
1030
+
1031
+ tsdf.A.transform([np.abs, lambda x : x+ 1 ])
1032
+
1033
+
1034
+ Transforming with a dict
1035
+ ++++++++++++++++++++++++
1036
+
1037
+
1038
+ Passing a dict of functions will will allow selective transforming per column.
1039
+
1040
+ .. ipython :: python
1041
+
1042
+ tsdf.transform({' A' : np.abs, ' B' : lambda x : x+ 1 })
1043
+
1044
+ Passing a dict of lists will generate a multi-indexed DataFrame with these
1045
+ selective transforms.
1046
+
1047
+ .. ipython :: python
1048
+
1049
+ tsdf.transform({' A' : np.abs, ' B' : [lambda x : x+ 1 , ' sqrt' ]})
1050
+
1051
+ .. _basics.elementwise :
1052
+
1053
+ Applying Elementwise Functions
1054
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
840
1055
841
1056
Since not all functions can be vectorized (accept NumPy arrays and return
842
1057
another array or value), the methods :meth: `~DataFrame.applymap ` on DataFrame
0 commit comments