Skip to content

Commit 758a905

Browse files
committed
API: Sort keys for DataFrame.assign
Previously the order was arbitrary. For predicitability, we'll sort before inserting.
1 parent fdfd66c commit 758a905

File tree

4 files changed

+28
-12
lines changed

4 files changed

+28
-12
lines changed

doc/source/dsintro.rst

+4-2
Original file line numberDiff line numberDiff line change
@@ -461,7 +461,7 @@ Inspired by `dplyr's
461461
<http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html#mutate>`__
462462
``mutate`` verb, DataFrame has an :meth:`~pandas.DataFrame.assign`
463463
method that allows you to easily create new columns that are potentially
464-
derived from existing columns.
464+
derived from existing columns.
465465

466466
.. ipython:: python
467467
@@ -511,7 +511,9 @@ DataFrame is returned, with the new values inserted.
511511
.. warning::
512512

513513
Since the function signature of ``assign`` is ``**kwargs``, a dictionary,
514-
the order of the new columns in the resulting DataFrame cannot be guaranteed.
514+
the order of the new columns in the resulting DataFrame cannot be guaranteed
515+
to match the order you pass in. To make things predictable, items are inserted
516+
alphabetically (by key) at the end of the DataFrame.
515517

516518
All expressions are computed first, and then assigned. So you can't refer
517519
to another column being assigned in the same call to ``assign``. For example:

doc/source/whatsnew/v0.16.1.txt

+4
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,10 @@ API changes
4545
- Add support for separating years and quarters using dashes, for
4646
example 2014-Q1. (:issue:`9688`)
4747

48+
- :meth:`~pandas.DataFrame.assign` now inserts new columns in alphabetical order. Previously
49+
the order was arbitrary. (:issue:`9777`)
50+
51+
4852
.. _whatsnew_0161.performance:
4953

5054
Performance Improvements

pandas/core/frame.py

+6-5
Original file line numberDiff line numberDiff line change
@@ -2244,10 +2244,11 @@ def assign(self, **kwargs):
22442244
Notes
22452245
-----
22462246
Since ``kwargs`` is a dictionary, the order of your
2247-
arguments may not be preserved, and so the order of the
2248-
new columns is not well defined. Assigning multiple
2249-
columns within the same ``assign`` is possible, but you cannot
2250-
reference other columns created within the same ``assign`` call.
2247+
arguments may not be preserved. The make things predicatable,
2248+
the columns are inserted in alphabetical order, at the end of
2249+
your DataFrame. Assigning multiple columns within the same
2250+
``assign`` is possible, but you cannot reference other columns
2251+
created within the same ``assign`` call.
22512252
22522253
Examples
22532254
--------
@@ -2296,7 +2297,7 @@ def assign(self, **kwargs):
22962297
results[k] = v
22972298

22982299
# ... and then assign
2299-
for k, v in results.items():
2300+
for k, v in sorted(results.items()):
23002301
data[k] = v
23012302

23022303
return data

pandas/tests/test_frame.py

+14-5
Original file line numberDiff line numberDiff line change
@@ -14073,12 +14073,21 @@ def test_assign(self):
1407314073
assert_frame_equal(result, expected)
1407414074

1407514075
def test_assign_multiple(self):
14076-
df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
14076+
df = DataFrame([[1, 4], [2, 5], [3, 6]], columns=['A', 'B'])
1407714077
result = df.assign(C=[7, 8, 9], D=df.A, E=lambda x: x.B)
14078-
expected = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9],
14079-
'D': [1, 2, 3], 'E': [4, 5, 6]})
14080-
# column order isn't preserved
14081-
assert_frame_equal(result.reindex_like(expected), expected)
14078+
expected = DataFrame([[1, 4, 7, 1, 4], [2, 5, 8, 2, 5],
14079+
[3, 6, 9, 3, 6]], columns=list('ABCDE'))
14080+
assert_frame_equal(result, expected)
14081+
14082+
def test_assign_alphabetical(self):
14083+
# GH 9818
14084+
df = DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
14085+
result = df.assign(D=df.A + df.B, C=df.A - df.B)
14086+
expected = DataFrame([[1, 2, -1, 3], [3, 4, -1, 7]],
14087+
columns=list('ABCD'))
14088+
assert_frame_equal(result, expected)
14089+
result = df.assign(C=df.A - df.B, D=df.A + df.B)
14090+
assert_frame_equal(result, expected)
1408214091

1408314092
def test_assign_bad(self):
1408414093
df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

0 commit comments

Comments
 (0)