Skip to content

Commit f56ec28

Browse files
committed
updated udf user guide based on reviews
1 parent c6891a0 commit f56ec28

File tree

2 files changed

+61
-35
lines changed

2 files changed

+61
-35
lines changed

doc/source/user_guide/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ Guides
7878
boolean
7979
visualization
8080
style
81+
user_defined_functions
8182
groupby
8283
window
8384
timeseries
@@ -88,4 +89,3 @@ Guides
8889
sparse
8990
gotchas
9091
cookbook
91-
user_defined_functions

doc/source/user_guide/user_defined_functions.rst

Lines changed: 60 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,46 @@
22

33
{{ header }}
44

5-
**************************************
6-
Introduction to User-Defined Functions
7-
**************************************
5+
*****************************
6+
User-Defined Functions (UDFs)
7+
*****************************
88

99
In pandas, User-Defined Functions (UDFs) provide a way to extend the library’s
1010
functionality by allowing users to apply custom computations to their data. While
1111
pandas comes with a set of built-in functions for data manipulation, UDFs offer
1212
flexibility when built-in methods are not sufficient. These functions can be
1313
applied at different levels: element-wise, row-wise, column-wise, or group-wise,
14-
and change the data differently, depending on the method used.
14+
and behave differently, depending on the method used.
15+
16+
Here’s a simple example to illustrate a UDF applied to a Series:
17+
18+
.. ipython:: python
19+
20+
s = pd.Series([1, 2, 3])
21+
22+
# Simple UDF that adds 1 to a value
23+
def add_one(x):
24+
return x + 1
25+
26+
# Apply the function element-wise using .map
27+
s.map(add_one)
28+
29+
You can also apply UDFs to an entire DataFrame. For example:
30+
31+
.. ipython:: python
32+
33+
df = pd.DataFrame({"A": [1, 2, 3], "B": [10, 20, 30]})
34+
35+
# UDF that takes a row and returns the sum of columns A and B
36+
def sum_row(row):
37+
return row["A"] + row["B"]
38+
39+
# Apply the function row-wise (axis=1 means apply across columns per row)
40+
df.apply(sum_row, axis=1)
41+
1542
1643
Why Not To Use User-Defined Functions
17-
-----------------------------------------
44+
-------------------------------------
1845

1946
While UDFs provide flexibility, they come with significant drawbacks, primarily
2047
related to performance and behavior. When using UDFs, pandas must perform inference
@@ -60,27 +87,25 @@ Methods that support User-Defined Functions
6087

6188
User-Defined Functions can be applied across various pandas methods:
6289

63-
* :meth:`~DataFrame.apply` - A flexible method that allows applying a function to Series,
64-
DataFrames, or groups of data.
65-
* :meth:`~DataFrame.agg` (Aggregate) - Used for summarizing data, supporting multiple
90+
* :meth:`~DataFrame.apply` - A flexible method that allows applying a function to Series and
91+
DataFrames.
92+
* :meth:`~DataFrame.agg` (Aggregate) - Used for summarizing data, supporting custom
6693
aggregation functions.
67-
* :meth:`~DataFrame.transform` - Applies a function to groups while preserving the shape of
94+
* :meth:`~DataFrame.transform` - Applies a function to Series and Dataframes while preserving the shape of
6895
the original data.
69-
* :meth:`~DataFrame.filter` - Filters groups based on a list of Boolean conditions.
70-
* :meth:`~DataFrame.map` - Applies an element-wise function to a Series, useful for
96+
* :meth:`~DataFrame.filter` - Filters Series and Dataframes based on a list of Boolean conditions.
97+
* :meth:`~DataFrame.map` - Applies an element-wise function to a Series or Dataframe, useful for
7198
transforming individual values.
72-
* :meth:`~DataFrame.pipe` - Allows chaining custom functions to process entire DataFrames or
73-
Series in a clean, readable manner.
99+
* :meth:`~DataFrame.pipe` - Allows chaining custom functions to process Series or
100+
Dataframes in a clean, readable manner.
74101

75102
All of these pandas methods can be used with both Series and DataFrame objects, providing versatile
76103
ways to apply UDFs across different pandas data structures.
77104

78105
.. note::
79-
Some of these methods are can also be applied to Groupby Objects. Refer to :ref:`groupby`.
80-
81-
Additionally, operations such as :ref:`resample()<timeseries>`, :ref:`rolling()<window>`,
82-
:ref:`expanding()<window>`, and :ref:`ewm()<window>` also support UDFs for performing custom
83-
computations over temporal or statistical windows.
106+
Some of these methods are can also be applied to groupby, resample, and various window objects.
107+
See :ref:`groupby`, :ref:`resample()<timeseries>`, :ref:`rolling()<window>`, :ref:`expanding()<window>`,
108+
and :ref:`ewm()<window>` for details.
84109

85110

86111
Choosing the Right Method
@@ -126,8 +151,8 @@ Documentation can be found at :meth:`~DataFrame.apply`.
126151
If you need to aggregate data, :meth:`DataFrame.agg` is a better choice than apply because it is
127152
specifically designed for aggregation operations.
128153

129-
When to use: Use :meth:`DataFrame.agg` for performing aggregations like sum, mean, or custom aggregation
130-
functions across groups.
154+
When to use: Use :meth:`DataFrame.agg` for performing custom aggregations, where the operation returns
155+
a scalar value on each input.
131156

132157
Documentation can be found at :meth:`~DataFrame.agg`.
133158

@@ -141,25 +166,26 @@ When to use: When you need to perform element-wise transformations that retain t
141166

142167
Documentation can be found at :meth:`~DataFrame.transform`.
143168

144-
Attempting to use common aggregation functions such as ``mean`` or ``sum`` will result in
145-
values being broadcasted to the original dimensions:
169+
.. code-block:: python
146170
147-
.. ipython:: python
171+
from sklearn.linear_model import LinearRegression
148172
149-
# Sample DataFrame
150173
df = pd.DataFrame({
151-
'Category': ['A', 'A', 'B', 'B', 'B'],
152-
'Values': [10, 20, 30, 40, 50]
153-
})
154-
155-
# Using transform with mean
156-
df['Mean_Transformed'] = df.groupby('Category')['Values'].transform('mean')
174+
'group': ['A', 'A', 'A', 'B', 'B', 'B'],
175+
'x': [1, 2, 3, 1, 2, 3],
176+
'y': [2, 4, 6, 1, 2, 1.5]
177+
}).set_index("x")
157178
158-
# Using transform with sum
159-
df['Sum_Transformed'] = df.groupby('Category')['Values'].transform('sum')
179+
# Function to fit a model to each group
180+
def fit_model(group):
181+
x = group.index.to_frame()
182+
y = group
183+
model = LinearRegression()
184+
model.fit(x, y)
185+
pred = model.predict(x)
186+
return pred
160187
161-
# Result broadcasted to DataFrame
162-
print(df)
188+
result = df.groupby('group').transform(fit_model)
163189
164190
:meth:`DataFrame.filter`
165191
~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)