@@ -700,7 +700,8 @@ on an entire ``DataFrame`` or ``Series``, row- or column-wise, or elementwise.
700
700
701
701
1. `Tablewise Function Application `_: :meth: `~DataFrame.pipe `
702
702
2. `Row or Column-wise Function Application `_: :meth: `~DataFrame.apply `
703
- 3. Elementwise _ function application: :meth: `~DataFrame.applymap `
703
+ 3. `Aggregation API `_: :meth: `~DataFrame.agg ` and :meth: `~DataFrame.transform `
704
+ 4. `Applying Elementwise Functions `_: :meth: `~DataFrame.applymap `
704
705
705
706
.. _basics.pipe :
706
707
@@ -776,6 +777,13 @@ statistics methods, take an optional ``axis`` argument:
776
777
df.apply(np.cumsum)
777
778
df.apply(np.exp)
778
779
780
+ ``.apply() `` will also dispatch on a string method name.
781
+
782
+ .. ipython :: python
783
+
784
+ df.apply(' mean' )
785
+ df.apply(' mean' , axis = 1 )
786
+
779
787
Depending on the return type of the function passed to :meth: `~DataFrame.apply `,
780
788
the result will either be of lower dimension or the same dimension.
781
789
@@ -825,16 +833,204 @@ set to True, the passed function will instead receive an ndarray object, which
825
833
has positive performance implications if you do not need the indexing
826
834
functionality.
827
835
828
- .. seealso ::
836
+ .. _basics.aggregate :
837
+
838
+ Aggregation API
839
+ ~~~~~~~~~~~~~~~
840
+
841
+ .. versionadded :: 0.20.0
842
+
843
+ The aggregation APi allows one to express possibly multiple aggregation operations in a single concise way.
844
+ This API is similar across pandas objects, :ref: `groupby aggregates <groupby.aggregate >`,
845
+ :ref: `window functions <stats.aggregate >`, and the :ref: `resample API <timeseries.aggregate >`.
846
+
847
+ We will use a similar starting frame from above.
848
+
849
+ .. ipython :: python
850
+
851
+ tsdf = pd.DataFrame(np.random.randn(10 , 3 ), columns = [' A' , ' B' , ' C' ],
852
+ index = pd.date_range(' 1/1/2000' , periods = 10 ))
853
+ tsdf.iloc[3 :7 ] = np.nan
854
+ tsdf
855
+
856
+ Using a single function is equivalent to ``.apply ``; You can also pass named methods as strings.
857
+ This will return a Series of the output.
858
+
859
+ .. ipython :: python
860
+
861
+ tsdf.agg(np.sum)
862
+
863
+ tsdf.agg(' sum' )
864
+
865
+ On a Series this will result in a scalar value
866
+
867
+ .. ipython :: python
868
+
869
+ tsdf.A.agg(' sum' )
870
+
871
+
872
+ Aggregating multiple functions at once
873
+ ++++++++++++++++++++++++++++++++++++++
874
+
875
+ You can pass arguments as a list. The results of each of the passed functions will be a row in the resultant DataFrame.
876
+ These are naturally named from the aggregation function.
877
+
878
+ .. ipython :: python
879
+
880
+ tsdf.agg([' sum' ])
881
+
882
+ Multiple functions yield multiple rows.
883
+
884
+ .. ipython :: python
885
+
886
+ tsdf.agg([' sum' , ' mean' ])
887
+
888
+ On a Series, multiple functions return a Series.
889
+
890
+ .. ipython :: python
891
+
892
+ tsdf.A.agg([' sum' , ' mean' ])
893
+
894
+
895
+ Aggregating with a dict of functions
896
+ ++++++++++++++++++++++++++++++++++++
897
+
898
+ Passing a dictionary of column name to function or list of functions, to ``DataFame.agg ``
899
+ allows you to customize which functions are applied to which columns.
900
+
901
+ .. ipython :: python
902
+
903
+ tsdf.agg({' A' : ' mean' , ' B' : ' sum' })
904
+
905
+ Passing a list-like will generate a DataFrame output. You will get a matrix-like output
906
+ of all of the aggregators; some may be missing values.
907
+
908
+ .. ipython :: python
909
+
910
+ tsdf.agg({' A' : [' mean' , ' min' ], ' B' : ' sum' })
911
+
912
+ For a Series, you can pass a dict; the keys will set the name of the column
913
+
914
+ .. ipython :: python
915
+
916
+ tsdf.A.agg({' foo' : [' sum' , ' mean' ]})
917
+
918
+ Multiple keys will yield multiple columns.
919
+
920
+ .. ipython :: python
921
+
922
+ tsdf.A.agg({' foo' : [' sum' , ' mean' ], ' bar' : [' min' , ' max' , lambda x : x.sum()+ 1 ]})
923
+
924
+ .. _basics.custom_describe :
829
925
830
- The section on :ref: `GroupBy <groupby >` demonstrates related, flexible
831
- functionality for grouping by some criterion, applying, and combining the
832
- results into a Series, DataFrame, etc.
926
+ Custom describe
927
+ +++++++++++++++
833
928
834
- .. _Elementwise :
929
+ With ``.agg() `` is it possible to easily create a custom describe function, similar
930
+ to the built in :ref: `describe function <basics.describe >`.
835
931
836
- Applying elementwise Python functions
837
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
932
+ .. ipython :: python
933
+
934
+ from functools import partial
935
+
936
+ q_25 = partial(pd.Series.quantile, q = 0.25 )
937
+ q_25.__name__ = ' 25%'
938
+ q_75 = partial(pd.Series.quantile, q = 0.75 )
939
+ q_75.__name__ = ' 75%'
940
+
941
+ tsdf.agg([' count' , ' mean' , ' std' , ' min' , q_25, ' median' , q_75, ' max' ])
942
+
943
+ .. _basics.transform :
944
+
945
+ Transform API
946
+ ~~~~~~~~~~~~~
947
+
948
+ .. versionadded :: 0.20.0
949
+
950
+ The ``transform `` method returns an object that is indexed the same (same size)
951
+ as the original. This API allows you to provide *multiple * operations at the same
952
+ time rather than one-by-one. Its api is quite similar to the ``.agg `` API.
953
+
954
+ Use a similar frame to the above sections.
955
+
956
+ .. ipython :: python
957
+
958
+ tsdf = pd.DataFrame(np.random.randn(10 , 3 ), columns = [' A' , ' B' , ' C' ],
959
+ index = pd.date_range(' 1/1/2000' , periods = 10 ))
960
+ tsdf.iloc[3 :7 ] = np.nan
961
+ tsdf
962
+
963
+ Transform the entire frame. Transform allows functions to input as a numpy function, string
964
+ function name and user defined function.
965
+
966
+ .. ipython :: python
967
+
968
+ tsdf.transform(np.abs)
969
+ tsdf.transform(' abs' )
970
+ tsdf.transform(lambda x : x.abs())
971
+
972
+ ``.transform() `` with a single function is equivalent to applying a function across the
973
+ columns.
974
+
975
+ .. ipython :: python
976
+
977
+ tsdf.apply(np.abs, axis = 1 )
978
+
979
+ Passing a single function to ``.transform() `` with a Series will yield a single Series in return.
980
+
981
+ .. ipython :: python
982
+
983
+ tsdf.A.transform(np.abs)
984
+
985
+
986
+ Transform with multiple functions
987
+ +++++++++++++++++++++++++++++++++
988
+
989
+ Passing multiple functions will yield a column multi-indexed DataFrame.
990
+ The first level will be the original frame column names; the second level
991
+ will be the names of the transforming functions.
992
+
993
+ .. ipython :: python
994
+
995
+ tsdf.transform([np.abs, lambda x : x+ 1 ])
996
+
997
+ Passing multiple functions to a Series will yield a DataFrame. The
998
+ resulting column names will be the transforming functions.
999
+
1000
+ .. ipython :: python
1001
+
1002
+ tsdf.A.transform([np.abs, lambda x : x+ 1 ])
1003
+
1004
+
1005
+ Transforming with a dict of functions
1006
+ +++++++++++++++++++++++++++++++++++++
1007
+
1008
+
1009
+ Passing a dict of functions will will allow selective transforming per column.
1010
+
1011
+ .. ipython :: python
1012
+
1013
+ tsdf.transform({' A' : np.abs, ' B' : lambda x : x+ 1 })
1014
+
1015
+ Passing a dict of lists will generate a multi-indexed DataFrame with these
1016
+ selective transforms.
1017
+
1018
+ .. ipython :: python
1019
+
1020
+ tsdf.transform({' A' : np.abs, ' B' : [lambda x : x+ 1 , ' sqrt' ]})
1021
+
1022
+ On a Series, passing a dict allows renaming as in ``.agg() ``
1023
+
1024
+ .. ipython :: python
1025
+
1026
+ tsdf.A.transform({' foo' : np.abs})
1027
+ tsdf.A.transform({' foo' : np.abs, ' bar' : [lambda x : x+ 1 , ' sqrt' ]})
1028
+
1029
+
1030
+ .. _basics.elementwise :
1031
+
1032
+ Applying Elementwise Functions
1033
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
838
1034
839
1035
Since not all functions can be vectorized (accept NumPy arrays and return
840
1036
another array or value), the methods :meth: `~DataFrame.applymap ` on DataFrame
0 commit comments