-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Updated the DataFrame.assign docstring #21917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
58942fc
de61b38
ef49f88
1fa9bc5
4cb55a4
7c7bb7a
d96a334
607d646
1b11063
3141dfe
66d376d
2168e4a
6693d9a
4ed3760
25030e2
bdca5e9
6c7c975
2d21d9b
9b92446
ec1f7eb
1bfe0c4
0ac130d
1faac78
24501d9
52b1bf5
2e21bd0
1a2b524
09a3d6b
128cbd9
f2af1c6
338683e
2fda626
49b560e
688c8a4
f3b3694
6b3e3c2
16725cf
2ec957b
9837dbc
e371129
788158d
243a19e
3445e19
7d6f275
b151427
dad9b7c
fab723c
1761dbc
93628c5
d950096
831a527
2b81853
e5d334f
2ac80c4
a507946
9fe3faf
7afa8a0
006c013
845b21a
3ec461f
9465a59
bbf119d
48de0db
3c6ad7d
4310671
49f7fc7
c15d8c0
d03ef77
52a480d
9935305
bada277
4f000f5
79b8763
1aaefe5
9fe0fbc
d64c0a8
0ba7b16
73ff71e
b7d9884
22b2e4a
27ea656
4a2a24c
96b7d84
113ff50
5474d32
6c765d3
61e4dee
ecfaf47
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3250,48 +3250,34 @@ def assign(self, **kwargs): | |
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame({'A': range(1, 11), 'B': np.random.randn(10)}) | ||
>>> df = pd.DataFrame({'temp_c': (17.0, 25.0)}, | ||
index=['Portland', 'Berkeley']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is missing |
||
temp_c | ||
Portland 17.0 | ||
Berkeley 25.0 | ||
|
||
Where the value is a callable, evaluated on `df`: | ||
|
||
>>> df.assign(ln_A = lambda x: np.log(x.A)) | ||
A B ln_A | ||
0 1 0.426905 0.000000 | ||
1 2 -0.780949 0.693147 | ||
2 3 -0.418711 1.098612 | ||
3 4 -0.269708 1.386294 | ||
4 5 -0.274002 1.609438 | ||
5 6 -0.500792 1.791759 | ||
6 7 1.649697 1.945910 | ||
7 8 -1.495604 2.079442 | ||
8 9 0.549296 2.197225 | ||
9 10 -0.758542 2.302585 | ||
|
||
Where the value already exists and is inserted: | ||
|
||
>>> newcol = np.log(df['A']) | ||
>>> df.assign(ln_A=newcol) | ||
A B ln_A | ||
0 1 0.426905 0.000000 | ||
1 2 -0.780949 0.693147 | ||
2 3 -0.418711 1.098612 | ||
3 4 -0.269708 1.386294 | ||
4 5 -0.274002 1.609438 | ||
5 6 -0.500792 1.791759 | ||
6 7 1.649697 1.945910 | ||
7 8 -1.495604 2.079442 | ||
8 9 0.549296 2.197225 | ||
9 10 -0.758542 2.302585 | ||
|
||
Where the keyword arguments depend on each other | ||
|
||
>>> df = pd.DataFrame({'A': [1, 2, 3]}) | ||
|
||
>>> df.assign(B=df.A, C=lambda x:x['A']+ x['B']) | ||
A B C | ||
0 1 1 2 | ||
1 2 2 4 | ||
2 3 3 6 | ||
>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32) | ||
temp_c temp_f | ||
Portland 17.0 62.6 | ||
Berkeley 25.0 77.0 | ||
|
||
Alternatively, the same behavior can be achieved by directly | ||
referencing an existing Series or list-like: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO |
||
>>> newcol = df['temp_c'] * 9 / 5 + 32 | ||
>>> df.assign(temp_f=newcol) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd use the expression directly in the assignment instead (i.e. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From my understanding, this example is to show that you can refer to an existing list or series and assign it to the df. The example above it is already using direct assignment. Do you think it is not necessary to show this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the previous example, the new column is assigned to a callable (which is run with the DataFrame as a parameter). In this example the new column is assigned to a Series. What I'm saying is that instead of saving the Series to newcol, and then assign the new column to the variable newcol, we can simply create the Series as a parameter. |
||
temp_c temp_f | ||
Portland 17.0 62.6 | ||
Berkeley 25.0 77.0 | ||
|
||
In Python 3.6+, you can create multiple columns within the same assign | ||
where one of the columns depends on another one defined within the same | ||
assign: | ||
>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32, | ||
temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same as before regarding |
||
temp_c temp_f temp_k | ||
Portland 17.0 62.6 290.15 | ||
Berkeley 25.0 77.0 298.15 | ||
""" | ||
data = self.copy() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a list instead of a tuple for the data? I think it's more conventional.