@@ -702,7 +702,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.
702
702
703
703
1. `Tablewise Function Application `_: :meth: `~DataFrame.pipe `
704
704
2. `Row or Column-wise Function Application `_: :meth: `~DataFrame.apply `
705
- 3. Elementwise _ function application: :meth: `~DataFrame.applymap `
705
+ 3. `Aggregation API `_: :meth: `~DataFrame.agg ` and :meth: `~DataFrame.transform `
706
+ 4. `Applying Elementwise Functions `_: :meth: `~DataFrame.applymap `
706
707
707
708
.. _basics.pipe :
708
709
@@ -778,6 +779,13 @@ statistics methods, take an optional ``axis`` argument:
778
779
df.apply(np.cumsum)
779
780
df.apply(np.exp)
780
781
782
+ ``.apply() `` will also dispatch on a string method name.
783
+
784
+ .. ipython :: python
785
+
786
+ df.apply(' mean' )
787
+ df.apply(' mean' , axis = 1 )
788
+
781
789
Depending on the return type of the function passed to :meth: `~DataFrame.apply `,
782
790
the result will either be of lower dimension or the same dimension.
783
791
@@ -827,16 +835,234 @@ set to True, the passed function will instead receive an ndarray object, which
827
835
has positive performance implications if you do not need the indexing
828
836
functionality.
829
837
830
- .. seealso ::
838
+ .. _basics.aggregate :
839
+
840
+ Aggregation API
841
+ ~~~~~~~~~~~~~~~
842
+
843
+ .. versionadded :: 0.20.0
844
+
845
+ The aggregation API allows one to express possibly multiple aggregation operations in a single concise way.
846
+ This API is similar across pandas objects, :ref: `groupby aggregates <groupby.aggregate >`,
847
+ :ref: `window functions <stats.aggregate >`, and the :ref: `resample API <timeseries.aggregate >`.
848
+
849
+ We will use a similar starting frame from above.
850
+
851
+ .. ipython :: python
852
+
853
+ tsdf = pd.DataFrame(np.random.randn(10 , 3 ), columns = [' A' , ' B' , ' C' ],
854
+ index = pd.date_range(' 1/1/2000' , periods = 10 ))
855
+ tsdf.iloc[3 :7 ] = np.nan
856
+ tsdf
857
+
858
+ Using a single function is equivalent to ``.apply ``; You can also pass named methods as strings.
859
+ This will return a Series of the output.
860
+
861
+ .. ipython :: python
862
+
863
+ tsdf.agg(np.sum)
864
+
865
+ tsdf.agg(' sum' )
866
+
867
+ # these are equivalent to a ``.sum()`` because we are aggregating on a single function
868
+ tsdf.sum()
869
+
870
+ On a Series this will result in a scalar value
871
+
872
+ .. ipython :: python
873
+
874
+ tsdf.A.agg(' sum' )
875
+
876
+
877
+ Aggregating multiple functions at once
878
+ ++++++++++++++++++++++++++++++++++++++
879
+
880
+ You can pass arguments as a list. The results of each of the passed functions will be a row in the resultant DataFrame.
881
+ These are naturally named from the aggregation function.
882
+
883
+ .. ipython :: python
884
+
885
+ tsdf.agg([' sum' ])
886
+
887
+ Multiple functions yield multiple rows.
831
888
832
- The section on :ref: `GroupBy <groupby >` demonstrates related, flexible
833
- functionality for grouping by some criterion, applying, and combining the
834
- results into a Series, DataFrame, etc.
889
+ .. ipython :: python
890
+
891
+ tsdf.agg([' sum' , ' mean' ])
892
+
893
+ On a Series, multiple functions return a Series, indexed by the function names.
894
+
895
+ .. ipython :: python
896
+
897
+ tsdf.A.agg([' sum' , ' mean' ])
898
+
899
+
900
+ Aggregating with a dict of functions
901
+ ++++++++++++++++++++++++++++++++++++
902
+
903
+ Passing a dictionary of column name to function or list of functions, to ``DataFame.agg ``
904
+ allows you to customize which functions are applied to which columns.
905
+
906
+ .. ipython :: python
907
+
908
+ tsdf.agg({' A' : ' mean' , ' B' : ' sum' })
909
+
910
+ Passing a list-like will generate a DataFrame output. You will get a matrix-like output
911
+ of all of the aggregators; some may be missing values.
912
+
913
+ .. ipython :: python
914
+
915
+ tsdf.agg({' A' : [' mean' , ' min' ], ' B' : ' sum' })
835
916
836
- .. _Elementwise :
917
+ For a Series, you can pass a dict. You will get back a MultiIndex Series; The outer level will
918
+ be the keys, the inner the name of the functions.
919
+
920
+ .. ipython :: python
921
+
922
+ tsdf.A.agg({' foo' : [' sum' , ' mean' ]})
923
+
924
+ Alternatively, using multiple dictionaries, you can have renamed elements with the aggregation
925
+
926
+ .. ipython :: python
927
+
928
+ tsdf.A.agg({' foo' : ' sum' , ' bar' : ' mean' })
929
+
930
+ Multiple keys will yield a MultiIndex Series. The outer level will be the keys, the inner
931
+ the names of the functions.
932
+
933
+ .. ipython :: python
934
+
935
+ tsdf.A.agg({' foo' : [' sum' , ' mean' ], ' bar' : [' min' , ' max' , lambda x : x.sum()+ 1 ]})
936
+
937
+ .. _basics.aggregation.mixed_dtypes :
938
+
939
+ Mixed Dtypes
940
+ ++++++++++++
941
+
942
+ When presented with mixed dtypes that cannot aggregate, ``.agg `` will only take the valid
943
+ aggregations. This is similiar to how groupby ``.agg `` works.
944
+
945
+ .. ipython :: python
837
946
838
- Applying elementwise Python functions
839
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
947
+ mdf = pd.DataFrame({' A' : [1 , 2 , 3 ],
948
+ ' B' : [1 ., 2 ., 3 .],
949
+ ' C' : [' foo' , ' bar' , ' baz' ],
950
+ ' D' : pd.date_range(' 20130101' , periods = 3 )})
951
+ mdf.dtypes
952
+
953
+ .. ipython :: python
954
+
955
+ mdf.agg([' min' , ' sum' ])
956
+
957
+ .. _basics.aggregation.custom_describe :
958
+
959
+ Custom describe
960
+ +++++++++++++++
961
+
962
+ With ``.agg() `` is it possible to easily create a custom describe function, similar
963
+ to the built in :ref: `describe function <basics.describe >`.
964
+
965
+ .. ipython :: python
966
+
967
+ from functools import partial
968
+
969
+ q_25 = partial(pd.Series.quantile, q = 0.25 )
970
+ q_25.__name__ = ' 25%'
971
+ q_75 = partial(pd.Series.quantile, q = 0.75 )
972
+ q_75.__name__ = ' 75%'
973
+
974
+ tsdf.agg([' count' , ' mean' , ' std' , ' min' , q_25, ' median' , q_75, ' max' ])
975
+
976
+ .. _basics.transform :
977
+
978
+ Transform API
979
+ ~~~~~~~~~~~~~
980
+
981
+ .. versionadded :: 0.20.0
982
+
983
+ The ``transform `` method returns an object that is indexed the same (same size)
984
+ as the original. This API allows you to provide *multiple * operations at the same
985
+ time rather than one-by-one. Its api is quite similar to the ``.agg `` API.
986
+
987
+ Use a similar frame to the above sections.
988
+
989
+ .. ipython :: python
990
+
991
+ tsdf = pd.DataFrame(np.random.randn(10 , 3 ), columns = [' A' , ' B' , ' C' ],
992
+ index = pd.date_range(' 1/1/2000' , periods = 10 ))
993
+ tsdf.iloc[3 :7 ] = np.nan
994
+ tsdf
995
+
996
+ Transform the entire frame. Transform allows functions to input as a numpy function, string
997
+ function name and user defined function.
998
+
999
+ .. ipython :: python
1000
+
1001
+ tsdf.transform(np.abs)
1002
+ tsdf.transform(' abs' )
1003
+ tsdf.transform(lambda x : x.abs())
1004
+
1005
+ Since this is a single function, this is equivalent to a ufunc application
1006
+
1007
+ .. ipython :: python
1008
+
1009
+ np.abs(tsdf)
1010
+
1011
+ Passing a single function to ``.transform() `` with a Series will yield a single Series in return.
1012
+
1013
+ .. ipython :: python
1014
+
1015
+ tsdf.A.transform(np.abs)
1016
+
1017
+
1018
+ Transform with multiple functions
1019
+ +++++++++++++++++++++++++++++++++
1020
+
1021
+ Passing multiple functions will yield a column multi-indexed DataFrame.
1022
+ The first level will be the original frame column names; the second level
1023
+ will be the names of the transforming functions.
1024
+
1025
+ .. ipython :: python
1026
+
1027
+ tsdf.transform([np.abs, lambda x : x+ 1 ])
1028
+
1029
+ Passing multiple functions to a Series will yield a DataFrame. The
1030
+ resulting column names will be the transforming functions.
1031
+
1032
+ .. ipython :: python
1033
+
1034
+ tsdf.A.transform([np.abs, lambda x : x+ 1 ])
1035
+
1036
+
1037
+ Transforming with a dict of functions
1038
+ +++++++++++++++++++++++++++++++++++++
1039
+
1040
+
1041
+ Passing a dict of functions will will allow selective transforming per column.
1042
+
1043
+ .. ipython :: python
1044
+
1045
+ tsdf.transform({' A' : np.abs, ' B' : lambda x : x+ 1 })
1046
+
1047
+ Passing a dict of lists will generate a multi-indexed DataFrame with these
1048
+ selective transforms.
1049
+
1050
+ .. ipython :: python
1051
+
1052
+ tsdf.transform({' A' : np.abs, ' B' : [lambda x : x+ 1 , ' sqrt' ]})
1053
+
1054
+ On a Series, passing a dict allows renaming as in ``.agg() ``
1055
+
1056
+ .. ipython :: python
1057
+
1058
+ tsdf.A.transform({' foo' : np.abs})
1059
+ tsdf.A.transform({' foo' : np.abs, ' bar' : [lambda x : x+ 1 , ' sqrt' ]})
1060
+
1061
+
1062
+ .. _basics.elementwise :
1063
+
1064
+ Applying Elementwise Functions
1065
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
840
1066
841
1067
Since not all functions can be vectorized (accept NumPy arrays and return
842
1068
another array or value), the methods :meth: `~DataFrame.applymap ` on DataFrame
0 commit comments