2
2
3
3
{{ header }}
4
4
5
- **************************************
6
- Introduction to User-Defined Functions
7
- **************************************
5
+ *****************************
6
+ User-Defined Functions (UDFs)
7
+ *****************************
8
8
9
9
In pandas, User-Defined Functions (UDFs) provide a way to extend the library’s
10
10
functionality by allowing users to apply custom computations to their data. While
11
11
pandas comes with a set of built-in functions for data manipulation, UDFs offer
12
12
flexibility when built-in methods are not sufficient. These functions can be
13
13
applied at different levels: element-wise, row-wise, column-wise, or group-wise,
14
- and change the data differently, depending on the method used.
14
+ and behave differently, depending on the method used.
15
+
16
+ Here’s a simple example to illustrate a UDF applied to a Series:
17
+
18
+ .. ipython :: python
19
+
20
+ s = pd.Series([1 , 2 , 3 ])
21
+
22
+ # Simple UDF that adds 1 to a value
23
+ def add_one (x ):
24
+ return x + 1
25
+
26
+ # Apply the function element-wise using .map
27
+ s.map(add_one)
28
+
29
+ You can also apply UDFs to an entire DataFrame. For example:
30
+
31
+ .. ipython :: python
32
+
33
+ df = pd.DataFrame({" A" : [1 , 2 , 3 ], " B" : [10 , 20 , 30 ]})
34
+
35
+ # UDF that takes a row and returns the sum of columns A and B
36
+ def sum_row (row ):
37
+ return row[" A" ] + row[" B" ]
38
+
39
+ # Apply the function row-wise (axis=1 means apply across columns per row)
40
+ df.apply(sum_row, axis = 1 )
41
+
15
42
16
43
Why Not To Use User-Defined Functions
17
- -----------------------------------------
44
+ -------------------------------------
18
45
19
46
While UDFs provide flexibility, they come with significant drawbacks, primarily
20
47
related to performance and behavior. When using UDFs, pandas must perform inference
@@ -60,27 +87,25 @@ Methods that support User-Defined Functions
60
87
61
88
User-Defined Functions can be applied across various pandas methods:
62
89
63
- * :meth: `~DataFrame.apply ` - A flexible method that allows applying a function to Series,
64
- DataFrames, or groups of data .
65
- * :meth: `~DataFrame.agg ` (Aggregate) - Used for summarizing data, supporting multiple
90
+ * :meth: `~DataFrame.apply ` - A flexible method that allows applying a function to Series and
91
+ DataFrames.
92
+ * :meth: `~DataFrame.agg ` (Aggregate) - Used for summarizing data, supporting custom
66
93
aggregation functions.
67
- * :meth: `~DataFrame.transform ` - Applies a function to groups while preserving the shape of
94
+ * :meth: `~DataFrame.transform ` - Applies a function to Series and Dataframes while preserving the shape of
68
95
the original data.
69
- * :meth: `~DataFrame.filter ` - Filters groups based on a list of Boolean conditions.
70
- * :meth: `~DataFrame.map ` - Applies an element-wise function to a Series, useful for
96
+ * :meth: `~DataFrame.filter ` - Filters Series and Dataframes based on a list of Boolean conditions.
97
+ * :meth: `~DataFrame.map ` - Applies an element-wise function to a Series or Dataframe , useful for
71
98
transforming individual values.
72
- * :meth: `~DataFrame.pipe ` - Allows chaining custom functions to process entire DataFrames or
73
- Series in a clean, readable manner.
99
+ * :meth: `~DataFrame.pipe ` - Allows chaining custom functions to process Series or
100
+ Dataframes in a clean, readable manner.
74
101
75
102
All of these pandas methods can be used with both Series and DataFrame objects, providing versatile
76
103
ways to apply UDFs across different pandas data structures.
77
104
78
105
.. note ::
79
- Some of these methods are can also be applied to Groupby Objects. Refer to :ref: `groupby `.
80
-
81
- Additionally, operations such as :ref: `resample()<timeseries> `, :ref: `rolling()<window> `,
82
- :ref: `expanding()<window> `, and :ref: `ewm()<window> ` also support UDFs for performing custom
83
- computations over temporal or statistical windows.
106
+ Some of these methods are can also be applied to groupby, resample, and various window objects.
107
+ See :ref: `groupby `, :ref: `resample()<timeseries> `, :ref: `rolling()<window> `, :ref: `expanding()<window> `,
108
+ and :ref: `ewm()<window> ` for details.
84
109
85
110
86
111
Choosing the Right Method
@@ -126,8 +151,8 @@ Documentation can be found at :meth:`~DataFrame.apply`.
126
151
If you need to aggregate data, :meth: `DataFrame.agg ` is a better choice than apply because it is
127
152
specifically designed for aggregation operations.
128
153
129
- When to use: Use :meth: `DataFrame.agg ` for performing aggregations like sum, mean, or custom aggregation
130
- functions across groups .
154
+ When to use: Use :meth: `DataFrame.agg ` for performing custom aggregations, where the operation returns
155
+ a scalar value on each input .
131
156
132
157
Documentation can be found at :meth: `~DataFrame.agg `.
133
158
@@ -141,25 +166,26 @@ When to use: When you need to perform element-wise transformations that retain t
141
166
142
167
Documentation can be found at :meth: `~DataFrame.transform `.
143
168
144
- Attempting to use common aggregation functions such as ``mean `` or ``sum `` will result in
145
- values being broadcasted to the original dimensions:
169
+ .. code-block :: python
146
170
147
- .. ipython :: python
171
+ from sklearn.linear_model import LinearRegression
148
172
149
- # Sample DataFrame
150
173
df = pd.DataFrame({
151
- ' Category' : [' A' , ' A' , ' B' , ' B' , ' B' ],
152
- ' Values' : [10 , 20 , 30 , 40 , 50 ]
153
- })
154
-
155
- # Using transform with mean
156
- df[' Mean_Transformed' ] = df.groupby(' Category' )[' Values' ].transform(' mean' )
174
+ ' group' : [' A' , ' A' , ' A' , ' B' , ' B' , ' B' ],
175
+ ' x' : [1 , 2 , 3 , 1 , 2 , 3 ],
176
+ ' y' : [2 , 4 , 6 , 1 , 2 , 1.5 ]
177
+ }).set_index(" x" )
157
178
158
- # Using transform with sum
159
- df[' Sum_Transformed' ] = df.groupby(' Category' )[' Values' ].transform(' sum' )
179
+ # Function to fit a model to each group
180
+ def fit_model (group ):
181
+ x = group.index.to_frame()
182
+ y = group
183
+ model = LinearRegression()
184
+ model.fit(x, y)
185
+ pred = model.predict(x)
186
+ return pred
160
187
161
- # Result broadcasted to DataFrame
162
- print (df)
188
+ result = df.groupby(' group' ).transform(fit_model)
163
189
164
190
:meth: `DataFrame.filter `
165
191
~~~~~~~~~~~~~~~~~~~~~~~~
0 commit comments