@@ -87,88 +87,64 @@ Methods that support User-Defined Functions
87
87
88
88
User-Defined Functions can be applied across various pandas methods:
89
89
90
- +----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
91
- | Method | Function Input | Function Output | Description |
92
- +============================+========================+==========================+===========================================================================+
93
- | :meth: `map ` | Scalar | Scalar | Apply a function to each element |
94
- +----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
95
- | :meth: `apply ` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column |
96
- +----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
97
- | :meth: `apply ` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
98
- +----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
99
- | :meth: `agg ` | Series/DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
100
- +----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
101
- | :meth: `transform ` | Series/DataFrame | Same shape as input | Apply a function while preserving shape; raises error if shape changes |
102
- +----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
103
- | :meth: `filter ` | - | - | Return rows that satisfy a boolean condition |
104
- +----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
105
- | :meth: `pipe ` | Series/DataFrame | Series/DataFrame | Chain functions together to apply to Series or Dataframe |
106
- +----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
90
+ +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
91
+ | Method | Function Input | Function Output | Description |
92
+ +============================+========================+==========================+==============================================================================================================================================+
93
+ | :meth: `map ` | Scalar | Scalar | Apply a function to each element |
94
+ +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
95
+ | :meth: `apply ` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column |
96
+ +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
97
+ | :meth: `apply ` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
98
+ +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
99
+ | :meth: `agg ` | Series/DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
100
+ +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
101
+ | :meth: `transform ` (axis=0) | Column (Series) | Column(Series) | Same as :meth: `apply ` with (axis=0), but it raises an exception if the function changes the shape of the data |
102
+ +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
103
+ | :meth: `transform ` (axis=1) | Row (Series) | Row (Series) | Same as :meth: `apply ` with (axis=1), but it raises an exception if the function changes the shape of the data |
104
+ +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
105
+ | :meth: `filter ` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False `` |
106
+ +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
107
+ | :meth: `pipe ` | Series/DataFrame | Series/DataFrame | Chain functions together to apply to Series or Dataframe |
108
+ +----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
107
109
108
- .. note ::
109
- Some of these methods are can also be applied to groupby, resample, and various window objects.
110
- See :ref: `groupby `, :ref: `resample()<timeseries> `, :ref: `rolling()<window> `, :ref: `expanding()<window> `,
111
- and :ref: `ewm()<window> ` for details.
112
-
113
-
114
- Choosing the Right Method
115
- -------------------------
116
110
When applying UDFs in pandas, it is essential to select the appropriate method based
117
111
on your specific task. Each method has its strengths and is designed for different use
118
112
cases. Understanding the purpose and behavior of each method will help you make informed
119
113
decisions, ensuring more efficient and maintainable code.
120
114
121
- Below is a table overview of all methods that accept UDFs:
122
-
123
- +------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
124
- | Method | Purpose | Supports UDFs | Keeps Shape | Recommended Use Case |
125
- +==================+======================================+===========================+====================+==========================================+
126
- | :meth: `apply ` | General-purpose function | Yes | Yes (when axis=1) | Custom row-wise or column-wise operations|
127
- +------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
128
- | :meth: `agg ` | Aggregation | Yes | No | Custom aggregation logic |
129
- +------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
130
- | :meth: `transform`| Transform without reducing dimensions| Yes | Yes | Broadcast element-wise transformations |
131
- +------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
132
- | :meth: `map ` | Element-wise mapping | Yes | Yes | Simple element-wise transformations |
133
- +------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
134
- | :meth: `pipe ` | Functional chaining | Yes | Yes | Building clean operation pipelines |
135
- +------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
136
- | :meth: `filter ` | Row/Column selection | Not directly | Yes | Subsetting based on conditions |
137
- +------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
115
+ .. note ::
116
+ Some of these methods are can also be applied to groupby, resample, and various window objects.
117
+ See :ref: `groupby `, :ref: `resample()<timeseries> `, :ref: `rolling()<window> `, :ref: `expanding()<window> `,
118
+ and :ref: `ewm()<window> ` for details.
119
+
138
120
139
121
:meth: `DataFrame.apply `
140
122
~~~~~~~~~~~~~~~~~~~~~~~
141
123
142
- The :meth: `DataFrame. apply ` allows you to apply UDFs along either rows or columns. While flexible,
124
+ The :meth: `apply ` method allows you to apply UDFs along either rows or columns. While flexible,
143
125
it is slower than vectorized operations and should be used only when you need operations
144
126
that cannot be achieved with built-in pandas functions.
145
127
146
- When to use: :meth: `DataFrame. apply ` is suitable when no alternative vectorized method or UDF method is available,
128
+ When to use: :meth: `apply ` is suitable when no alternative vectorized method or UDF method is available,
147
129
but consider optimizing performance with vectorized operations wherever possible.
148
130
149
- Documentation can be found at :meth: `~DataFrame.apply `.
150
-
151
131
:meth: `DataFrame.agg `
152
132
~~~~~~~~~~~~~~~~~~~~~
153
133
154
- If you need to aggregate data, :meth: `DataFrame. agg ` is a better choice than apply because it is
134
+ If you need to aggregate data, :meth: `agg ` is a better choice than apply because it is
155
135
specifically designed for aggregation operations.
156
136
157
- When to use: Use :meth: `DataFrame. agg ` for performing custom aggregations, where the operation returns
137
+ When to use: Use :meth: `agg ` for performing custom aggregations, where the operation returns
158
138
a scalar value on each input.
159
139
160
- Documentation can be found at :meth: `~DataFrame.agg `.
161
-
162
140
:meth: `DataFrame.transform `
163
141
~~~~~~~~~~~~~~~~~~~~~~~~~~~
164
142
165
- The transform method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
143
+ The :meth: ` transform ` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
166
144
It is generally faster than apply because it can take advantage of pandas' internal optimizations.
167
145
168
146
When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
169
147
170
- Documentation can be found at :meth: `~DataFrame.transform `.
171
-
172
148
.. code-block :: python
173
149
174
150
from sklearn.linear_model import LinearRegression
@@ -193,11 +169,11 @@ Documentation can be found at :meth:`~DataFrame.transform`.
193
169
:meth: `DataFrame.filter `
194
170
~~~~~~~~~~~~~~~~~~~~~~~~
195
171
196
- The :meth: `DataFrame. filter ` method is used to select subsets of the DataFrame’s
172
+ The :meth: `filter ` method is used to select subsets of the DataFrame’s
197
173
columns or row. It is useful when you want to extract specific columns or rows that
198
174
match particular conditions.
199
175
200
- When to use: Use :meth: `DataFrame. filter ` when you want to use a UDF to create a subset of a DataFrame or Series
176
+ When to use: Use :meth: `filter ` when you want to use a UDF to create a subset of a DataFrame or Series
201
177
202
178
.. note ::
203
179
:meth: `DataFrame.filter ` does not accept UDFs, but can accept
@@ -223,27 +199,20 @@ When to use: Use :meth:`DataFrame.filter` when you want to use a UDF to create a
223
199
Since filter does not directly accept a UDF, you have to apply the UDF indirectly,
224
200
for example, by using list comprehensions.
225
201
226
- Documentation can be found at :meth: `~DataFrame.filter `.
227
-
228
202
:meth: `DataFrame.map `
229
203
~~~~~~~~~~~~~~~~~~~~~
230
204
231
- :meth: `DataFrame.map ` is used specifically to apply element-wise UDFs and is better
232
- for this purpose compared to :meth: `DataFrame.apply ` because of its better performance.
205
+ The :meth: `map ` method is used specifically to apply element-wise UDFs.
233
206
234
- When to use: Use map for applying element-wise UDFs to DataFrames or Series.
235
-
236
- Documentation can be found at :meth: `~DataFrame.map `.
207
+ When to use: Use :meth: `map ` for applying element-wise UDFs to DataFrames or Series.
237
208
238
209
:meth: `DataFrame.pipe `
239
210
~~~~~~~~~~~~~~~~~~~~~~
240
211
241
- The pipe method is useful for chaining operations together into a clean and readable pipeline.
212
+ The :meth: ` pipe ` method is useful for chaining operations together into a clean and readable pipeline.
242
213
It is a helpful tool for organizing complex data processing workflows.
243
214
244
- When to use: Use pipe when you need to create a pipeline of operations and want to keep the code readable and maintainable.
245
-
246
- Documentation can be found at :meth: `~DataFrame.pipe `.
215
+ When to use: Use :meth: `pipe ` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
247
216
248
217
249
218
Performance
@@ -255,7 +224,7 @@ consider using built-in ``NumPy`` or ``pandas`` functions instead of UDFs
255
224
for common operations.
256
225
257
226
.. note ::
258
- If performance is critical, explore **vectorizated operations ** before resorting
227
+ If performance is critical, explore **vectorized operations ** before resorting
259
228
to UDFs.
260
229
261
230
Vectorized Operations
@@ -283,9 +252,9 @@ Measuring how long each operation takes:
283
252
284
253
Vectorized operations in pandas are significantly faster than using :meth: `DataFrame.apply `
285
254
with UDFs because they leverage highly optimized C functions
286
- via NumPy to process entire arrays at once. This approach avoids the overhead of looping
255
+ via `` NumPy `` to process entire arrays at once. This approach avoids the overhead of looping
287
256
through rows in Python and making separate function calls for each row, which is slow and
288
- inefficient. Additionally, NumPy arrays benefit from memory efficiency and CPU-level
257
+ inefficient. Additionally, `` NumPy `` arrays benefit from memory efficiency and CPU-level
289
258
optimizations, making vectorized operations the preferred choice whenever possible.
290
259
291
260
@@ -306,10 +275,10 @@ especially for computationally heavy tasks.
306
275
Using :meth: `DataFrame.pipe ` for Composable Logic
307
276
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
308
277
309
- Another useful pattern for improving readability and composability— especially when mixing
310
- vectorized logic with UDFs— is to use the :meth: `DataFrame.pipe ` method.
278
+ Another useful pattern for improving readability and composability, especially when mixing
279
+ vectorized logic with UDFs, is to use the :meth: `DataFrame.pipe ` method.
311
280
312
- The `` .pipe `` method doesn't improve performance directly, but it enables cleaner
281
+ :meth: ` DataFrame .pipe ` doesn't improve performance directly, but it enables cleaner
313
282
method chaining by passing the entire object into a function. This is especially helpful
314
283
when chaining custom transformations:
315
284
@@ -327,8 +296,8 @@ when chaining custom transformations:
327
296
)
328
297
329
298
This is functionally equivalent to calling ``add_ratio_column(df) ``, but keeps your code
330
- clean and composable. The function you pass to `` .pipe ` ` can use vectorized operations,
331
- row-wise UDFs, or any other logic—`` .pipe ` ` is agnostic.
299
+ clean and composable. The function you pass to :meth: ` DataFrame .pipe ` can use vectorized operations,
300
+ row-wise UDFs, or any other logic; :meth: ` DataFrame .pipe ` is agnostic.
332
301
333
302
.. note ::
334
303
While :meth: `DataFrame.pipe ` does not improve performance on its own,
0 commit comments