You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -89,10 +115,10 @@ The :meth:`DataFrame.apply` allows you to apply UDFs along either rows or column
89
115
it is slower than vectorized operations and should be used only when you need operations
90
116
that cannot be achieved with built-in pandas functions.
91
117
92
-
When to use: :meth:`DataFrame.apply` is suitable when no alternative vectorized method is available, but consider
93
-
optimizing performance with vectorized operations wherever possible.
118
+
When to use: :meth:`DataFrame.apply` is suitable when no alternative vectorized method or UDF method is available,
119
+
but consider optimizing performance with vectorized operations wherever possible.
94
120
95
-
Examples of usage can be found :meth:`~DataFrame.apply`.
121
+
Documentation can be found at:meth:`~DataFrame.apply`.
96
122
97
123
:meth:`DataFrame.agg`
98
124
~~~~~~~~~~~~~~~~~~~~~
@@ -103,17 +129,17 @@ specifically designed for aggregation operations.
103
129
When to use: Use :meth:`DataFrame.agg` for performing aggregations like sum, mean, or custom aggregation
104
130
functions across groups.
105
131
106
-
Examples of usage can be found :meth:`~DataFrame.agg`.
132
+
Documentation can be found at:meth:`~DataFrame.agg`.
107
133
108
134
:meth:`DataFrame.transform`
109
135
~~~~~~~~~~~~~~~~~~~~~~~~~~~
110
136
111
137
The transform method is ideal for performing element-wise transformations while preserving the shape of the original DataFrame.
112
-
It’s generally faster than apply because it can take advantage of pandas' internal optimizations.
138
+
It is generally faster than apply because it can take advantage of pandas' internal optimizations.
113
139
114
140
When to use: When you need to perform element-wise transformations that retain the original structure of the DataFrame.
115
141
116
-
Documentation can be found :meth:`~DataFrame.transform`.
142
+
Documentation can be found at :meth:`~DataFrame.transform`.
117
143
118
144
Attempting to use common aggregation functions such as ``mean`` or ``sum`` will result in
119
145
values being broadcasted to the original dimensions:
@@ -158,17 +184,17 @@ When to use: Use :meth:`DataFrame.filter` when you want to use a UDF to create a
158
184
'D': [10, 11, 12]
159
185
})
160
186
161
-
#Define a function that filters out columns where the name is longer than 1 character
187
+
#Function that filters out columns where the name is longer than 1 character
162
188
defis_long_name(column_name):
163
189
returnlen(column_name) >1
164
190
165
-
df_filtered = df[[col for col in df.columns if is_long_name(col)]]
191
+
df_filtered = df.filter(items=[col for col in df.columns if is_long_name(col)])
166
192
print(df_filtered)
167
193
168
194
Since filter does not directly accept a UDF, you have to apply the UDF indirectly,
169
-
such as by using list comprehensions.
195
+
for example, by using list comprehensions.
170
196
171
-
Documentation can be found :meth:`~DataFrame.filter`.
197
+
Documentation can be found at :meth:`~DataFrame.filter`.
172
198
173
199
:meth:`DataFrame.map`
174
200
~~~~~~~~~~~~~~~~~~~~~
@@ -178,17 +204,17 @@ for this purpose compared to :meth:`DataFrame.apply` because of its better perfo
178
204
179
205
When to use: Use map for applying element-wise UDFs to DataFrames or Series.
180
206
181
-
Documentation can be found :meth:`~DataFrame.map`.
207
+
Documentation can be found at :meth:`~DataFrame.map`.
182
208
183
209
:meth:`DataFrame.pipe`
184
210
~~~~~~~~~~~~~~~~~~~~~~
185
211
186
212
The pipe method is useful for chaining operations together into a clean and readable pipeline.
187
213
It is a helpful tool for organizing complex data processing workflows.
188
214
189
-
When to use: Use pipe when you need to create a pipeline of transformations and want to keep the code readable and maintainable.
215
+
When to use: Use pipe when you need to create a pipeline of operations and want to keep the code readable and maintainable.
190
216
191
-
Documentation can be found :meth:`~DataFrame.pipe`.
217
+
Documentation can be found at :meth:`~DataFrame.pipe`.
192
218
193
219
194
220
Best Practices
@@ -232,3 +258,18 @@ via NumPy to process entire arrays at once. This approach avoids the overhead of
232
258
through rows in Python and making separate function calls for each row, which is slow and
233
259
inefficient. Additionally, NumPy arrays benefit from memory efficiency and CPU-level
234
260
optimizations, making vectorized operations the preferred choice whenever possible.
261
+
262
+
263
+
Improving Performance with UDFs
264
+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
265
+
266
+
In scenarios where UDFs are necessary, there are still ways to mitigate their performance drawbacks.
267
+
One approach is to use **Numba**, a Just-In-Time (JIT) compiler that can significantly speed up numerical
268
+
Python code by compiling Python functions to optimized machine code at runtime.
269
+
270
+
By annotating your UDFs with ``@numba.jit``, you can achieve performance closer to vectorized operations,
271
+
especially for computationally heavy tasks.
272
+
273
+
.. note::
274
+
You may also refer to the user guide on `Enhancing performance <https://pandas.pydata.org/pandas-docs/dev/user_guide/enhancingperf.html#numba-jit-compilation>`_
0 commit comments