Skip to content

Commit f958371

Browse files
author
datajanko
committed
fixes issues from comments
- fixes docstring - fixes wrong identation - restructuring tests - modified content in whatsnew
1 parent 76d072f commit f958371

File tree

3 files changed

+80
-23
lines changed

3 files changed

+80
-23
lines changed

doc/source/whatsnew/v0.22.0.txt

+50-1
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,56 @@ Current Behavior
119119

120120
s.rank(na_option='top')
121121

122+
.. _whatsnew_0220.enhancements.assign_dependent:
123+
124+
125+
``.assign()`` accepts dependent arguments
126+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
127+
128+
The :func:`DataFrame.assign()` now accepts dependent kwargs. In earlier versions this throws a Keyerror exception anymore. (:issue: `14207)
129+
130+
Specifically, defining a new column inside assign may be referenced in the same assign statement if a callable is used. For example
131+
132+
.. code-block:: ipython
133+
134+
In [3]: df = pd.DataFrame({'A': [1, 2, 3]})
135+
136+
In [4]: df.assign(B=df.A, C=lambda x:x['A']+ x['B'])
137+
Out[4]:
138+
A B C
139+
0 1 1 2
140+
1 2 2 4
141+
2 3 3 6
142+
143+
.. warning::
144+
145+
This may subtly change the behavior of your code when you're
146+
using ``assign`` to update an existing column. Previously, callables
147+
refering to other variables being updated would get the "old" values
148+
149+
.. code-block:: ipython
150+
151+
In [2]: df = pd.DataFrame({"A": [1, 2, 3]})
152+
153+
In [3]: df.assign(A=lambda df: df.A + 1, C=lambda df: df.A * -1)
154+
Out[3]:
155+
A C
156+
0 2 -1
157+
1 3 -2
158+
2 4 -3
159+
160+
Now, callables will get the "new" value
161+
162+
.. ipython:: python
163+
164+
In [6]: df.assign(A=df.A+1, C= lambda df: df.A* -1)
165+
Out[6]:
166+
A C
167+
0 2 -2
168+
1 3 -3
169+
2 4 -4
170+
171+
122172
.. _whatsnew_0220.enhancements.other:
123173

124174
Other Enhancements
@@ -139,7 +189,6 @@ Other Enhancements
139189
- :func:`read_excel()` has gained the ``nrows`` parameter (:issue:`16645`)
140190
- :func:``DataFrame.to_json`` and ``Series.to_json`` now accept an ``index`` argument which allows the user to exclude the index from the JSON output (:issue:`17394`)
141191
- ``IntervalIndex.to_tuples()`` has gained the ``na_tuple`` parameter to control whether NA is returned as a tuple of NA, or NA itself (:issue:`18756`)
142-
- :func:``DataFrame.assign()`` now acceepts dependent kwargs, e.g. `df.assign(b=1, c=lambda x:x['b'])` does not throw an exception anymore. (:issue: `14207)
143192

144193
.. _whatsnew_0220.api_breaking:
145194

pandas/core/frame.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -2659,11 +2659,11 @@ def assign(self, **kwargs):
26592659
\*\*kwargs. For python 3.5 and earlier, since \*\*kwargs is unordered,
26602660
the columns are inserted in alphabetical order at the end of your
26612661
DataFrame. Assigning multiple columns within the same ``assign``
2662-
is possible, but for python 3.5 and eralier you cannot reference other
2663-
columns created within the same ``assign`` call. For python 3.6 and
2664-
above it is possible to reference columns created in an assignment.
2665-
To this end you have to respect the order of |*|*kwargs and use
2666-
callables referencing the assigned columns.
2662+
is possible, but for python 3.5 and earlier, you cannot reference
2663+
other columns created within the same ``assign`` call.
2664+
For python 3.6 and above it is possible to reference columns created
2665+
in an assignment. To this end you have to respect the order of kwargs
2666+
and use callables referencing the assigned columns.
26672667
26682668
Examples
26692669
--------
@@ -2713,7 +2713,7 @@ def assign(self, **kwargs):
27132713
results[k] = com._apply_if_callable(v, data)
27142714

27152715
# sort by key for 3.5 and earlier
2716-
results = sorted(results.items())
2716+
results = sorted(results.items())
27172717
# ... and then assign
27182718
for k, v in results:
27192719
data[k] = v

pandas/tests/frame/test_mutate_columns.py

+24-16
Original file line numberDiff line numberDiff line change
@@ -89,26 +89,34 @@ def test_assign_bad(self):
8989
df.assign(lambda x: x.A)
9090
with pytest.raises(AttributeError):
9191
df.assign(C=df.A, D=df.A + df.C)
92-
if not PY36:
93-
with pytest.raises(KeyError):
94-
df.assign(C=lambda df: df.A,
95-
D=lambda df: df['A'] + df['C'])
96-
with pytest.raises(KeyError):
97-
df.assign(C=df.A, D=lambda x: x['A'] + x['C'])
9892

93+
@pytest.mark.skipif(PY36, reason="""Issue #14207: valid for python
94+
3.6 and above""")
95+
def test_assign_bad_old_version(self):
96+
df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
97+
98+
# Key C does not exist at defition time of df
99+
with pytest.raises(KeyError):
100+
df.assign(C=lambda df: df.A,
101+
D=lambda df: df['A'] + df['C'])
102+
with pytest.raises(KeyError):
103+
df.assign(C=df.A, D=lambda x: x['A'] + x['C'])
104+
105+
@pytest.mark.skipif(not PY36, reason="""Issue #14207: not valid for
106+
python 3.5 and below""")
99107
def test_assign_dependent(self):
100108
df = DataFrame({'A': [1, 2], 'B': [3, 4]})
101-
if PY36:
102-
result = df.assign(C=df.A, D=lambda x: x['A'] + x['C'])
103-
expected = DataFrame([[1, 3, 1, 2], [2, 4, 2, 4]],
104-
columns=list('ABCD'))
105-
assert_frame_equal(result, expected)
106109

107-
result = df.assign(C=lambda df: df.A,
108-
D=lambda df: df['A'] + df['C'])
109-
expected = DataFrame([[1, 3, 1, 2], [2, 4, 2, 4]],
110-
columns=list('ABCD'))
111-
assert_frame_equal(result, expected)
110+
result = df.assign(C=df.A, D=lambda x: x['A'] + x['C'])
111+
expected = DataFrame([[1, 3, 1, 2], [2, 4, 2, 4]],
112+
columns=list('ABCD'))
113+
assert_frame_equal(result, expected)
114+
115+
result = df.assign(C=lambda df: df.A,
116+
D=lambda df: df['A'] + df['C'])
117+
expected = DataFrame([[1, 3, 1, 2], [2, 4, 2, 4]],
118+
columns=list('ABCD'))
119+
assert_frame_equal(result, expected)
112120

113121
def test_insert_error_msmgs(self):
114122

0 commit comments

Comments
 (0)