Skip to content

Commit c6891a0

Browse files
committed
updated user guide
1 parent 561a1f5 commit c6891a0

File tree

1 file changed

+74
-33
lines changed

1 file changed

+74
-33
lines changed

doc/source/user_guide/user_defined_functions.rst

Lines changed: 74 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,10 @@ Why Not To Use User-Defined Functions
1717
-----------------------------------------
1818

1919
While UDFs provide flexibility, they come with significant drawbacks, primarily
20-
related to performance. Unlike vectorized pandas operations, UDFs are slower because pandas lacks
21-
insight into what they are computing, making it difficult to apply efficient handling or optimization
22-
techniques. As a result, pandas resorts to less efficient processing methods that significantly
23-
slow down computations. Additionally, relying on UDFs often sacrifices the benefits
24-
of pandas’ built-in, optimized methods, limiting compatibility and overall performance.
20+
related to performance and behavior. When using UDFs, pandas must perform inference
21+
on the result, and that inference could be incorrect. Furthermore, unlike vectorized operations,
22+
UDFs are slower because pandas can't optimize their computations, leading to
23+
inefficient processing.
2524

2625
.. note::
2726
In general, most tasks can and should be accomplished using pandas’ built-in methods or vectorized operations.
@@ -33,6 +32,29 @@ Despite their drawbacks, UDFs can be helpful when:
3332
* **Extending pandas' Functionality**: Applying external libraries or specialized algorithms unavailable in pandas.
3433
* **Handling Complex Grouped Operations**: Performing operations on grouped data that standard methods do not support.
3534

35+
For example:
36+
37+
.. code-block:: python
38+
39+
from sklearn.linear_model import LinearRegression
40+
41+
# Sample data
42+
df = pd.DataFrame({
43+
'group': ['A', 'A', 'A', 'B', 'B', 'B'],
44+
'x': [1, 2, 3, 1, 2, 3],
45+
'y': [2, 4, 6, 1, 2, 1.5]
46+
})
47+
48+
# Function to fit a model to each group
49+
def fit_model(group):
50+
model = LinearRegression()
51+
model.fit(group[['x']], group['y'])
52+
group['y_pred'] = model.predict(group[['x']])
53+
return group
54+
55+
result = df.groupby('group').apply(fit_model)
56+
57+
3658
Methods that support User-Defined Functions
3759
-------------------------------------------
3860

@@ -56,6 +78,10 @@ ways to apply UDFs across different pandas data structures.
5678
.. note::
5779
Some of these methods are can also be applied to Groupby Objects. Refer to :ref:`groupby`.
5880

81+
Additionally, operations such as :ref:`resample()<timeseries>`, :ref:`rolling()<window>`,
82+
:ref:`expanding()<window>`, and :ref:`ewm()<window>` also support UDFs for performing custom
83+
computations over temporal or statistical windows.
84+
5985

6086
Choosing the Right Method
6187
-------------------------
@@ -66,21 +92,21 @@ decisions, ensuring more efficient and maintainable code.
6692

6793
Below is a table overview of all methods that accept UDFs:
6894

69-
+------------------+--------------------------------------+---------------------------+--------------------+---------------------------+------------------------------------------+
70-
| Method | Purpose | Supports UDFs | Keeps Shape | Performance | Recommended Use Case |
71-
+==================+======================================+===========================+====================+===========================+==========================================+
72-
| :meth:`apply` | General-purpose function | Yes | Yes (when axis=1) | Slow | Custom row-wise or column-wise operations|
73-
+------------------+--------------------------------------+---------------------------+--------------------+---------------------------+------------------------------------------+
74-
| :meth:`agg` | Aggregation | Yes | No | Fast (if using built-ins) | Custom aggregation logic |
75-
+------------------+--------------------------------------+---------------------------+--------------------+---------------------------+------------------------------------------+
76-
| :meth:`transform`| Transform without reducing dimensions| Yes | Yes | Fast (if vectorized) | Broadcast element-wise transformations |
77-
+------------------+--------------------------------------+---------------------------+--------------------+---------------------------+------------------------------------------+
78-
| :meth:`map` | Element-wise mapping | Yes | Yes | Moderate | Simple element-wise transformations |
79-
+------------------+--------------------------------------+---------------------------+--------------------+---------------------------+------------------------------------------+
80-
| :meth:`pipe` | Functional chaining | Yes | Yes | Depends on function | Building clean pipelines |
81-
+------------------+--------------------------------------+---------------------------+--------------------+---------------------------+------------------------------------------+
82-
| :meth:`filter` | Row/Column selection | Not directly | Yes | Fast | Subsetting based on conditions |
83-
+------------------+--------------------------------------+---------------------------+--------------------+---------------------------+------------------------------------------+
95+
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
96+
| Method | Purpose | Supports UDFs | Keeps Shape | Recommended Use Case |
97+
+==================+======================================+===========================+====================+==========================================+
98+
| :meth:`apply` | General-purpose function | Yes | Yes (when axis=1) | Custom row-wise or column-wise operations|
99+
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
100+
| :meth:`agg` | Aggregation | Yes | No | Custom aggregation logic |
101+
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
102+
| :meth:`transform`| Transform without reducing dimensions| Yes | Yes | Broadcast element-wise transformations |
103+
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
104+
| :meth:`map` | Element-wise mapping | Yes | Yes | Simple element-wise transformations |
105+
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
106+
| :meth:`pipe` | Functional chaining | Yes | Yes | Building clean operation pipelines |
107+
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
108+
| :meth:`filter` | Row/Column selection | Not directly | Yes | Subsetting based on conditions |
109+
+------------------+--------------------------------------+---------------------------+--------------------+------------------------------------------+
84110

85111
:meth:`DataFrame.apply`
86112
~~~~~~~~~~~~~~~~~~~~~~~
@@ -89,10 +115,10 @@ The :meth:`DataFrame.apply` allows you to apply UDFs along either rows or column
89115
it is slower than vectorized operations and should be used only when you need operations
90116
that cannot be achieved with built-in pandas functions.
91117

92-
When to use: :meth:`DataFrame.apply` is suitable when no alternative vectorized method is available, but consider
93-
optimizing performance with vectorized operations wherever possible.
118+
When to use: :meth:`DataFrame.apply` is suitable when no alternative vectorized method or UDF method is available,
119+
but consider optimizing performance with vectorized operations wherever possible.
94120

95-
Examples of usage can be found :meth:`~DataFrame.apply`.
121+
Documentation can be found at :meth:`~DataFrame.apply`.
96122

97123
:meth:`DataFrame.agg`
98124
~~~~~~~~~~~~~~~~~~~~~
@@ -103,17 +129,17 @@ specifically designed for aggregation operations.
103129
When to use: Use :meth:`DataFrame.agg` for performing aggregations like sum, mean, or custom aggregation
104130
functions across groups.
105131

106-
Examples of usage can be found :meth:`~DataFrame.agg`.
132+
Documentation can be found at :meth:`~DataFrame.agg`.
107133

108134
:meth:`DataFrame.transform`
109135
~~~~~~~~~~~~~~~~~~~~~~~~~~~
110136

111137
The transform method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
112-
It’s generally faster than apply because it can take advantage of pandas' internal optimizations.
138+
It is generally faster than apply because it can take advantage of pandas' internal optimizations.
113139

114140
When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
115141

116-
Documentation can be found :meth:`~DataFrame.transform`.
142+
Documentation can be found at :meth:`~DataFrame.transform`.
117143

118144
Attempting to use common aggregation functions such as ``mean`` or ``sum`` will result in
119145
values being broadcasted to the original dimensions:
@@ -158,17 +184,17 @@ When to use: Use :meth:`DataFrame.filter` when you want to use a UDF to create a
158184
'D': [10, 11, 12]
159185
})
160186
161-
# Define a function that filters out columns where the name is longer than 1 character
187+
# Function that filters out columns where the name is longer than 1 character
162188
def is_long_name(column_name):
163189
return len(column_name) > 1
164190
165-
df_filtered = df[[col for col in df.columns if is_long_name(col)]]
191+
df_filtered = df.filter(items=[col for col in df.columns if is_long_name(col)])
166192
print(df_filtered)
167193
168194
Since filter does not directly accept a UDF, you have to apply the UDF indirectly,
169-
such as by using list comprehensions.
195+
for example, by using list comprehensions.
170196

171-
Documentation can be found :meth:`~DataFrame.filter`.
197+
Documentation can be found at :meth:`~DataFrame.filter`.
172198

173199
:meth:`DataFrame.map`
174200
~~~~~~~~~~~~~~~~~~~~~
@@ -178,17 +204,17 @@ for this purpose compared to :meth:`DataFrame.apply` because of its better perfo
178204

179205
When to use: Use map for applying element-wise UDFs to DataFrames or Series.
180206

181-
Documentation can be found :meth:`~DataFrame.map`.
207+
Documentation can be found at :meth:`~DataFrame.map`.
182208

183209
:meth:`DataFrame.pipe`
184210
~~~~~~~~~~~~~~~~~~~~~~
185211

186212
The pipe method is useful for chaining operations together into a clean and readable pipeline.
187213
It is a helpful tool for organizing complex data processing workflows.
188214

189-
When to use: Use pipe when you need to create a pipeline of transformations and want to keep the code readable and maintainable.
215+
When to use: Use pipe when you need to create a pipeline of operations and want to keep the code readable and maintainable.
190216

191-
Documentation can be found :meth:`~DataFrame.pipe`.
217+
Documentation can be found at :meth:`~DataFrame.pipe`.
192218

193219

194220
Best Practices
@@ -232,3 +258,18 @@ via NumPy to process entire arrays at once. This approach avoids the overhead of
232258
through rows in Python and making separate function calls for each row, which is slow and
233259
inefficient. Additionally, NumPy arrays benefit from memory efficiency and CPU-level
234260
optimizations, making vectorized operations the preferred choice whenever possible.
261+
262+
263+
Improving Performance with UDFs
264+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
265+
266+
In scenarios where UDFs are necessary, there are still ways to mitigate their performance drawbacks.
267+
One approach is to use **Numba**, a Just-In-Time (JIT) compiler that can significantly speed up numerical
268+
Python code by compiling Python functions to optimized machine code at runtime.
269+
270+
By annotating your UDFs with ``@numba.jit``, you can achieve performance closer to vectorized operations,
271+
especially for computationally heavy tasks.
272+
273+
.. note::
274+
You may also refer to the user guide on `Enhancing performance <https://pandas.pydata.org/pandas-docs/dev/user_guide/enhancingperf.html#numba-jit-compilation>`_
275+
for a more detailed guide to using **Numba**.

0 commit comments

Comments
 (0)