Skip to content

Commit af7964b

Browse files
committed
updated udf user guide based on reviews
1 parent efd5201 commit af7964b

File tree

1 file changed

+43
-74
lines changed

1 file changed

+43
-74
lines changed

doc/source/user_guide/user_defined_functions.rst

Lines changed: 43 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -87,88 +87,64 @@ Methods that support User-Defined Functions
8787

8888
User-Defined Functions can be applied across various pandas methods:
8989

90-
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
91-
| Method | Function Input | Function Output | Description |
92-
+============================+========================+==========================+===========================================================================+
93-
| :meth:`map` | Scalar | Scalar | Apply a function to each element |
94-
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
95-
| :meth:`apply` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column |
96-
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
97-
| :meth:`apply` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
98-
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
99-
| :meth:`agg` | Series/DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
100-
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
101-
| :meth:`transform` | Series/DataFrame | Same shape as input | Apply a function while preserving shape; raises error if shape changes |
102-
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
103-
| :meth:`filter` | - | - | Return rows that satisfy a boolean condition |
104-
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
105-
| :meth:`pipe` | Series/DataFrame | Series/DataFrame | Chain functions together to apply to Series or Dataframe |
106-
+----------------------------+------------------------+--------------------------+---------------------------------------------------------------------------+
90+
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
91+
| Method | Function Input | Function Output | Description |
92+
+============================+========================+==========================+==============================================================================================================================================+
93+
| :meth:`map` | Scalar | Scalar | Apply a function to each element |
94+
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
95+
| :meth:`apply` (axis=0) | Column (Series) | Column (Series) | Apply a function to each column |
96+
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
97+
| :meth:`apply` (axis=1) | Row (Series) | Row (Series) | Apply a function to each row |
98+
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
99+
| :meth:`agg` | Series/DataFrame | Scalar or Series | Aggregate and summarizes values, e.g., sum or custom reducer |
100+
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
101+
| :meth:`transform` (axis=0) | Column (Series) | Column(Series) | Same as :meth:`apply` with (axis=0), but it raises an exception if the function changes the shape of the data |
102+
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
103+
| :meth:`transform` (axis=1) | Row (Series) | Row (Series) | Same as :meth:`apply` with (axis=1), but it raises an exception if the function changes the shape of the data |
104+
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
105+
| :meth:`filter` | Series or DataFrame | Boolean | Only accepts UDFs in group by. Function is called for each group, and the group is removed from the result if the function returns ``False`` |
106+
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
107+
| :meth:`pipe` | Series/DataFrame | Series/DataFrame | Chain functions together to apply to Series or Dataframe |
108+
+----------------------------+------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
107109

108-
.. note::
109-
Some of these methods are can also be applied to groupby, resample, and various window objects.
110-
See :ref:`groupby`, :ref:`resample()<timeseries>`, :ref:`rolling()<window>`, :ref:`expanding()<window>`,
111-
and :ref:`ewm()<window>` for details.
112-
113-
114-
Choosing the Right Method
115-
-------------------------
116110
When applying UDFs in pandas, it is essential to select the appropriate method based
117111
on your specific task. Each method has its strengths and is designed for different use
118112
cases. Understanding the purpose and behavior of each method will help you make informed
119113
decisions, ensuring more efficient and maintainable code.
120114

121-
Below is a table overview of all methods that accept UDFs:
122-
123-
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
124-
| Method | Purpose | Supports UDFs | Keeps Shape | Recommended Use Case |
125-
+==================+======================================+===========================+====================+==========================================+
126-
| :meth:`apply` | General-purpose function | Yes | Yes (when axis=1) | Custom row-wise or column-wise operations|
127-
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
128-
| :meth:`agg` | Aggregation | Yes | No | Custom aggregation logic |
129-
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
130-
| :meth:`transform`| Transform without reducing dimensions| Yes | Yes | Broadcast element-wise transformations |
131-
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
132-
| :meth:`map` | Element-wise mapping | Yes | Yes | Simple element-wise transformations |
133-
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
134-
| :meth:`pipe` | Functional chaining | Yes | Yes | Building clean operation pipelines |
135-
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
136-
| :meth:`filter` | Row/Column selection | Not directly | Yes | Subsetting based on conditions |
137-
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
115+
.. note::
116+
Some of these methods are can also be applied to groupby, resample, and various window objects.
117+
See :ref:`groupby`, :ref:`resample()<timeseries>`, :ref:`rolling()<window>`, :ref:`expanding()<window>`,
118+
and :ref:`ewm()<window>` for details.
119+
138120

139121
:meth:`DataFrame.apply`
140122
~~~~~~~~~~~~~~~~~~~~~~~
141123

142-
The :meth:`DataFrame.apply` allows you to apply UDFs along either rows or columns. While flexible,
124+
The :meth:`apply` method allows you to apply UDFs along either rows or columns. While flexible,
143125
it is slower than vectorized operations and should be used only when you need operations
144126
that cannot be achieved with built-in pandas functions.
145127

146-
When to use: :meth:`DataFrame.apply` is suitable when no alternative vectorized method or UDF method is available,
128+
When to use: :meth:`apply` is suitable when no alternative vectorized method or UDF method is available,
147129
but consider optimizing performance with vectorized operations wherever possible.
148130

149-
Documentation can be found at :meth:`~DataFrame.apply`.
150-
151131
:meth:`DataFrame.agg`
152132
~~~~~~~~~~~~~~~~~~~~~
153133

154-
If you need to aggregate data, :meth:`DataFrame.agg` is a better choice than apply because it is
134+
If you need to aggregate data, :meth:`agg` is a better choice than apply because it is
155135
specifically designed for aggregation operations.
156136

157-
When to use: Use :meth:`DataFrame.agg` for performing custom aggregations, where the operation returns
137+
When to use: Use :meth:`agg` for performing custom aggregations, where the operation returns
158138
a scalar value on each input.
159139

160-
Documentation can be found at :meth:`~DataFrame.agg`.
161-
162140
:meth:`DataFrame.transform`
163141
~~~~~~~~~~~~~~~~~~~~~~~~~~~
164142

165-
The transform method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
143+
The :meth:`transform` method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
166144
It is generally faster than apply because it can take advantage of pandas' internal optimizations.
167145

168146
When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
169147

170-
Documentation can be found at :meth:`~DataFrame.transform`.
171-
172148
.. code-block:: python
173149
174150
from sklearn.linear_model import LinearRegression
@@ -193,11 +169,11 @@ Documentation can be found at :meth:`~DataFrame.transform`.
193169
:meth:`DataFrame.filter`
194170
~~~~~~~~~~~~~~~~~~~~~~~~
195171

196-
The :meth:`DataFrame.filter` method is used to select subsets of the DataFrame’s
172+
The :meth:`filter` method is used to select subsets of the DataFrame’s
197173
columns or row. It is useful when you want to extract specific columns or rows that
198174
match particular conditions.
199175

200-
When to use: Use :meth:`DataFrame.filter` when you want to use a UDF to create a subset of a DataFrame or Series
176+
When to use: Use :meth:`filter` when you want to use a UDF to create a subset of a DataFrame or Series
201177

202178
.. note::
203179
:meth:`DataFrame.filter` does not accept UDFs, but can accept
@@ -223,27 +199,20 @@ When to use: Use :meth:`DataFrame.filter` when you want to use a UDF to create a
223199
Since filter does not directly accept a UDF, you have to apply the UDF indirectly,
224200
for example, by using list comprehensions.
225201

226-
Documentation can be found at :meth:`~DataFrame.filter`.
227-
228202
:meth:`DataFrame.map`
229203
~~~~~~~~~~~~~~~~~~~~~
230204

231-
:meth:`DataFrame.map` is used specifically to apply element-wise UDFs and is better
232-
for this purpose compared to :meth:`DataFrame.apply` because of its better performance.
205+
The :meth:`map` method is used specifically to apply element-wise UDFs.
233206

234-
When to use: Use map for applying element-wise UDFs to DataFrames or Series.
235-
236-
Documentation can be found at :meth:`~DataFrame.map`.
207+
When to use: Use :meth:`map` for applying element-wise UDFs to DataFrames or Series.
237208

238209
:meth:`DataFrame.pipe`
239210
~~~~~~~~~~~~~~~~~~~~~~
240211

241-
The pipe method is useful for chaining operations together into a clean and readable pipeline.
212+
The :meth:`pipe` method is useful for chaining operations together into a clean and readable pipeline.
242213
It is a helpful tool for organizing complex data processing workflows.
243214

244-
When to use: Use pipe when you need to create a pipeline of operations and want to keep the code readable and maintainable.
245-
246-
Documentation can be found at :meth:`~DataFrame.pipe`.
215+
When to use: Use :meth:`pipe` when you need to create a pipeline of operations and want to keep the code readable and maintainable.
247216

248217

249218
Performance
@@ -255,7 +224,7 @@ consider using built-in ``NumPy`` or ``pandas`` functions instead of UDFs
255224
for common operations.
256225

257226
.. note::
258-
If performance is critical, explore **vectorizated operations** before resorting
227+
If performance is critical, explore **vectorized operations** before resorting
259228
to UDFs.
260229

261230
Vectorized Operations
@@ -283,9 +252,9 @@ Measuring how long each operation takes:
283252
284253
Vectorized operations in pandas are significantly faster than using :meth:`DataFrame.apply`
285254
with UDFs because they leverage highly optimized C functions
286-
via NumPy to process entire arrays at once. This approach avoids the overhead of looping
255+
via ``NumPy`` to process entire arrays at once. This approach avoids the overhead of looping
287256
through rows in Python and making separate function calls for each row, which is slow and
288-
inefficient. Additionally, NumPy arrays benefit from memory efficiency and CPU-level
257+
inefficient. Additionally, ``NumPy`` arrays benefit from memory efficiency and CPU-level
289258
optimizations, making vectorized operations the preferred choice whenever possible.
290259

291260

@@ -306,10 +275,10 @@ especially for computationally heavy tasks.
306275
Using :meth:`DataFrame.pipe` for Composable Logic
307276
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
308277

309-
Another useful pattern for improving readability and composabilityespecially when mixing
310-
vectorized logic with UDFsis to use the :meth:`DataFrame.pipe` method.
278+
Another useful pattern for improving readability and composability, especially when mixing
279+
vectorized logic with UDFs, is to use the :meth:`DataFrame.pipe` method.
311280

312-
The ``.pipe`` method doesn't improve performance directly, but it enables cleaner
281+
:meth:`DataFrame.pipe` doesn't improve performance directly, but it enables cleaner
313282
method chaining by passing the entire object into a function. This is especially helpful
314283
when chaining custom transformations:
315284

@@ -327,8 +296,8 @@ when chaining custom transformations:
327296
)
328297
329298
This is functionally equivalent to calling ``add_ratio_column(df)``, but keeps your code
330-
clean and composable. The function you pass to ``.pipe`` can use vectorized operations,
331-
row-wise UDFs, or any other logic—``.pipe`` is agnostic.
299+
clean and composable. The function you pass to :meth:`DataFrame.pipe` can use vectorized operations,
300+
row-wise UDFs, or any other logic; :meth:`DataFrame.pipe` is agnostic.
332301

333302
.. note::
334303
While :meth:`DataFrame.pipe` does not improve performance on its own,

0 commit comments

Comments
 (0)