From b7ec738abc0b83a69c8ca493e99dfbf0f13d4824 Mon Sep 17 00:00:00 2001
From: datajanko <jan-christopher.koch@eon.com>
Date: Mon, 18 Dec 2017 22:23:11 +0100
Subject: [PATCH 1/3] ENH: df.assign accepting dependent **kwargs (#14207)

Specifically, 'df.assign(b=1, c=lambda x:x['b'])'
does not throw an exception in python 3.6 and above.
Further details are discussed in Issues #14207 and #18797.

populates dsintro and frame.py with examples and warning

- adds example to frame.py
- reworked warning in dsintro
- reworked Notes in frame.py

Remains open:

frame.py probably is responsible vor travis not passing: doc test that requires python 3.6
---
 doc/source/dsintro.rst                    | 49 ++++++++++++++---------
 doc/source/whatsnew/v0.23.0.txt           | 40 ++++++++++++++++++
 pandas/core/frame.py                      | 49 +++++++++++++++--------
 pandas/tests/frame/test_mutate_columns.py | 26 +++++++++++-
 4 files changed, 128 insertions(+), 36 deletions(-)

diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst
index d7650b6b0938f..a78d8f2360962 100644
--- a/doc/source/dsintro.rst
+++ b/doc/source/dsintro.rst
@@ -95,7 +95,7 @@ constructed from the sorted keys of the dict, if possible.
 
     NaN (not a number) is the standard missing data marker used in pandas.
 
-**From scalar value** 
+**From scalar value**
 
 If ``data`` is a scalar value, an index must be
 provided. The value will be repeated to match the length of **index**.
@@ -154,7 +154,7 @@ See also the :ref:`section on attribute access<indexing.attribute_access>`.
 Vectorized operations and label alignment with Series
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-When working with raw NumPy arrays, looping through value-by-value is usually 
+When working with raw NumPy arrays, looping through value-by-value is usually
 not necessary. The same is true when working with Series in pandas.
 Series can also be passed into most NumPy methods expecting an ndarray.
 
@@ -324,7 +324,7 @@ From a list of dicts
 From a dict of tuples
 ~~~~~~~~~~~~~~~~~~~~~
 
-You can automatically create a multi-indexed frame by passing a tuples 
+You can automatically create a multi-indexed frame by passing a tuples
 dictionary.
 
 .. ipython:: python
@@ -347,7 +347,7 @@ column name provided).
 **Missing Data**
 
 Much more will be said on this topic in the :ref:`Missing data <missing_data>`
-section. To construct a DataFrame with missing data, we use ``np.nan`` to 
+section. To construct a DataFrame with missing data, we use ``np.nan`` to
 represent missing values. Alternatively, you may pass a ``numpy.MaskedArray``
 as the data argument to the DataFrame constructor, and its masked entries will
 be considered missing.
@@ -370,7 +370,7 @@ set to ``'index'`` in order to use the dict keys as row labels.
 
 ``DataFrame.from_records`` takes a list of tuples or an ndarray with structured
 dtype. It works analogously to the normal ``DataFrame`` constructor, except that
-the resulting DataFrame index may be a specific field of the structured 
+the resulting DataFrame index may be a specific field of the structured
 dtype. For example:
 
 .. ipython:: python
@@ -506,25 +506,36 @@ to be inserted (for example, a ``Series`` or NumPy array), or a function
 of one argument to be called on the ``DataFrame``. A *copy* of the original
 DataFrame is returned, with the new values inserted.
 
+Starting from Python 3.6 ``**kwargs`` is an ordered dictionary and :func:`DataFrame.assign`
+respects the order of the keyword arguments. You can use assign in the following way:
+
+.. ipython:: python
+
+   dfa = pd.DataFrame({"A": [1, 2, 3],
+                       "B": [4, 5, 6]})
+   dfa.assign(C=lambda x: x['A'] + x['B'],
+              D=lambda x: x['A'] + x['C'])
+
 .. warning::
 
-  Since the function signature of ``assign`` is ``**kwargs``, a dictionary,
-  the order of the new columns in the resulting DataFrame cannot be guaranteed
-  to match the order you pass in. To make things predictable, items are inserted
-  alphabetically (by key) at the end of the DataFrame.
+   Prior to Python 3.6, this may subtly change the behavior of your code when you are
+   using :func:`DataFrame.assign` to update an existing column.
 
-  All expressions are computed first, and then assigned. So you can't refer
-  to another column being assigned in the same call to ``assign``. For example:
+   Since the function signature of ``assign`` is ``**kwargs``, a dictionary,
+   the order of the new columns in the resulting DataFrame cannot be guaranteed
+   to match the order you pass in. To make things predictable, items are inserted
+   alphabetically (by key) at the end of the DataFrame.
 
    .. ipython::
-       :verbatim:
+      :verbatim:
+
+      In [1]: # Don't do this, bad reference to `C`
+              df.assign(C = lambda x: x['A'] + x['B'],
+                        D = lambda x: x['A'] + x['C'])
+      In [2]: # Instead, break it into two assigns
+              (df.assign(C = lambda x: x['A'] + x['B'])
+                 .assign(D = lambda x: x['A'] + x['C']))
 
-       In [1]: # Don't do this, bad reference to `C`
-               df.assign(C = lambda x: x['A'] + x['B'],
-                         D = lambda x: x['A'] + x['C'])
-       In [2]: # Instead, break it into two assigns
-               (df.assign(C = lambda x: x['A'] + x['B'])
-                  .assign(D = lambda x: x['A'] + x['C']))
 
 Indexing / Selection
 ~~~~~~~~~~~~~~~~~~~~
@@ -914,7 +925,7 @@ For example, using the earlier example data, we could do:
 Squeezing
 ~~~~~~~~~
 
-Another way to change the dimensionality of an object is to ``squeeze`` a 1-len 
+Another way to change the dimensionality of an object is to ``squeeze`` a 1-len
 object, similar to ``wp['Item1']``.
 
 .. ipython:: python
diff --git a/doc/source/whatsnew/v0.23.0.txt b/doc/source/whatsnew/v0.23.0.txt
index 083242cd69b74..11474719f44a5 100644
--- a/doc/source/whatsnew/v0.23.0.txt
+++ b/doc/source/whatsnew/v0.23.0.txt
@@ -248,6 +248,46 @@ Current Behavior:
 
     pd.RangeIndex(1, 5) / 0
 
+.. _whatsnew_0230.enhancements.assign_dependent:
+
+``.assign()`` accepts dependent arguments
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The :func:`DataFrame.assign` now accepts dependent keyword arguments for python version later than 3.6 (see also `PEP 468
+<https://www.python.org/dev/peps/pep-0468/>`_). Later keyword arguments may now refer to earlier ones if the argument is a callable. See the
+:ref:`documentation here <dsintro.chained_assignment>` (:issue:`14207`)
+
+.. ipython:: python
+
+    df = pd.DataFrame({'A': [1, 2, 3]})
+    df
+    df.assign(B=df.A, C=lambda x:x['A']+ x['B'])
+
+.. warning::
+
+  This may subtly change the behavior of your code when you're
+  using ``.assign()`` to update an existing column. Previously, callables
+  referring to other variables being updated would get the "old" values
+
+  Previous Behaviour:
+
+  .. code-block:: ipython
+
+      In [2]: df = pd.DataFrame({"A": [1, 2, 3]})
+
+      In [3]: df.assign(A=lambda df: df.A + 1, C=lambda df: df.A * -1)
+      Out[3]:
+         A  C
+      0  2 -1
+      1  3 -2
+      2  4 -3
+
+  New Behaviour:
+
+  .. ipython:: python
+
+      df.assign(A=df.A+1, C= lambda df: df.A* -1)
+
 .. _whatsnew_0230.enhancements.other:
 
 Other Enhancements
diff --git a/pandas/core/frame.py b/pandas/core/frame.py
index 6d8dcb8a1ca89..c99c59db1d8cb 100644
--- a/pandas/core/frame.py
+++ b/pandas/core/frame.py
@@ -2687,12 +2687,17 @@ def assign(self, **kwargs):
 
         Notes
         -----
-        For python 3.6 and above, the columns are inserted in the order of
-        \*\*kwargs. For python 3.5 and earlier, since \*\*kwargs is unordered,
-        the columns are inserted in alphabetical order at the end of your
-        DataFrame.  Assigning multiple columns within the same ``assign``
-        is possible, but you cannot reference other columns created within
-        the same ``assign`` call.
+        Assigning multiple columns within the same ``assign`` is possible.
+        For Python 3.6 and above, later items in '\*\*kwargs' may refer to
+        newly created or modified columns in 'df'; items are computed and
+        assigned into 'df' in order.  For Python 3.5 and below, the order of
+        keyword arguments is not specified, you cannot refer to newly created
+        or modified columns. All items are computed first, and then assigned
+        in alphabetical order.
+
+        .. versionmodified :: 0.23.0
+
+            Keyword argument order is maintained for Python 3.6 and later.
 
         Examples
         --------
@@ -2728,22 +2733,34 @@ def assign(self, **kwargs):
         7   8 -1.495604  2.079442
         8   9  0.549296  2.197225
         9  10 -0.758542  2.302585
+
+        Where the keyword arguments depend on each other
+
+        >>> df = pd.DataFrame({'A': [1, 2, 3]})
+
+        >>> df.assign(B=df.A, C=lambda x:x['A']+ x['B'])
+            A  B  C
+         0  1  1  2
+         1  2  2  4
+         2  3  3  6
         """
         data = self.copy()
 
-        # do all calculations first...
-        results = OrderedDict()
-        for k, v in kwargs.items():
-            results[k] = com._apply_if_callable(v, data)
-
-        # preserve order for 3.6 and later, but sort by key for 3.5 and earlier
+        # >= 3.6 preserve order of kwargs
         if PY36:
-            results = results.items()
+            for k, v in kwargs.items():
+                data[k] = com._apply_if_callable(v, data)
         else:
+            # <= 3.5: do all calculations first...
+            results = OrderedDict()
+            for k, v in kwargs.items():
+                results[k] = com._apply_if_callable(v, data)
+
+            # <= 3.5 and earlier
             results = sorted(results.items())
-        # ... and then assign
-        for k, v in results:
-            data[k] = v
+            # ... and then assign
+            for k, v in results:
+                data[k] = v
         return data
 
     def _sanitize_column(self, key, value, broadcast=True):
diff --git a/pandas/tests/frame/test_mutate_columns.py b/pandas/tests/frame/test_mutate_columns.py
index 9acdf2f17d86a..8236a41d00243 100644
--- a/pandas/tests/frame/test_mutate_columns.py
+++ b/pandas/tests/frame/test_mutate_columns.py
@@ -89,11 +89,35 @@ def test_assign_bad(self):
             df.assign(lambda x: x.A)
         with pytest.raises(AttributeError):
             df.assign(C=df.A, D=df.A + df.C)
+
+    @pytest.mark.skipif(PY36, reason="""Issue #14207: valid for python
+                        3.6 and above""")
+    def test_assign_dependent_old_python(self):
+        df = DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
+
+        # Key C does not exist at defition time of df
         with pytest.raises(KeyError):
-            df.assign(C=lambda df: df.A, D=lambda df: df['A'] + df['C'])
+            df.assign(C=lambda df: df.A,
+                      D=lambda df: df['A'] + df['C'])
         with pytest.raises(KeyError):
             df.assign(C=df.A, D=lambda x: x['A'] + x['C'])
 
+    @pytest.mark.skipif(not PY36, reason="""Issue #14207: not valid for
+                        python 3.5 and below""")
+    def test_assign_dependent(self):
+        df = DataFrame({'A': [1, 2], 'B': [3, 4]})
+
+        result = df.assign(C=df.A, D=lambda x: x['A'] + x['C'])
+        expected = DataFrame([[1, 3, 1, 2], [2, 4, 2, 4]],
+                             columns=list('ABCD'))
+        assert_frame_equal(result, expected)
+
+        result = df.assign(C=lambda df: df.A,
+                           D=lambda df: df['A'] + df['C'])
+        expected = DataFrame([[1, 3, 1, 2], [2, 4, 2, 4]],
+                             columns=list('ABCD'))
+        assert_frame_equal(result, expected)
+
     def test_insert_error_msmgs(self):
 
         # GH 7432

From 094f346f65919cf8ea95943470bfda7e6e3a9eae Mon Sep 17 00:00:00 2001
From: Tom Augspurger <tom.w.augspurger@gmail.com>
Date: Fri, 9 Feb 2018 14:32:13 -0600
Subject: [PATCH 2/3] Update docs

---
 doc/source/dsintro.rst | 65 +++++++++++++++++++++++++++++++-----------
 1 file changed, 49 insertions(+), 16 deletions(-)

diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst
index a78d8f2360962..4b4855c6693f2 100644
--- a/doc/source/dsintro.rst
+++ b/doc/source/dsintro.rst
@@ -506,8 +506,11 @@ to be inserted (for example, a ``Series`` or NumPy array), or a function
 of one argument to be called on the ``DataFrame``. A *copy* of the original
 DataFrame is returned, with the new values inserted.
 
-Starting from Python 3.6 ``**kwargs`` is an ordered dictionary and :func:`DataFrame.assign`
-respects the order of the keyword arguments. You can use assign in the following way:
+.. versionmodified:: 0.23.0
+
+Starting with Python 3.6 the order of ``**kwargs`` is preserved. This allows
+for *dependent* assignment, where an expression later in ``**kwargs`` can refer
+to a column created earlier in the same :meth:`~DataFrame.assign`.
 
 .. ipython:: python
 
@@ -516,25 +519,55 @@ respects the order of the keyword arguments. You can use assign in the following
    dfa.assign(C=lambda x: x['A'] + x['B'],
               D=lambda x: x['A'] + x['C'])
 
+In the second expression, ``x['C']`` will refer to the newly created column,
+that's equal to ``dfa['A'] + dfa['B']``.
+
 .. warning::
 
-   Prior to Python 3.6, this may subtly change the behavior of your code when you are
-   using :func:`DataFrame.assign` to update an existing column.
+   Dependent assignment maybe subtly change the behavior of your code between
+   Python 3.6 and older versions of Python.
+
+If you wish write code that supports versions of python before and after 3.6,
+you'll need to take care when passing ``assign`` expressions that
 
-   Since the function signature of ``assign`` is ``**kwargs``, a dictionary,
-   the order of the new columns in the resulting DataFrame cannot be guaranteed
-   to match the order you pass in. To make things predictable, items are inserted
-   alphabetically (by key) at the end of the DataFrame.
+   1. Updating an existing column
+   2. Refering to the newly updated column in the same ``assign``
 
-   .. ipython::
-      :verbatim:
+   For example, we'll update column "A" and then refer to it when creating "B".
+
+   .. code-block:: python
+
+      >>> dependent = pd.DataFrame({"A": [1, 1, 1]})
+      >>> dependent.assign(A=lambda x: x["A"] + 1,
+                           B=lambda x: x["A"] + 2)
+
+   For Python 3.5 and earlier the expression creating ``B`` refers to the
+   "old" value of ``A``, ``[1, 1, 1]``. The output is then
+
+   .. code-block:: python
+
+         A  B
+      0  2  3
+      1  2  3
+      2  2  3
+
+   For Python 3.6 and later, the expression creating ``A`` refers to the
+   "new" value of ``A``, ``[2, 2, 2]``, which results in
+
+   .. code-block:: python
+
+         A  B
+      0  2  4
+      1  2  4
+      2  2  4
+
+To write code compatible with all versions of Python, split the assignment in two.
+
+.. ipython:: python
 
-      In [1]: # Don't do this, bad reference to `C`
-              df.assign(C = lambda x: x['A'] + x['B'],
-                        D = lambda x: x['A'] + x['C'])
-      In [2]: # Instead, break it into two assigns
-              (df.assign(C = lambda x: x['A'] + x['B'])
-                 .assign(D = lambda x: x['A'] + x['C']))
+   dependent = pd.DataFrame({"A": [1, 1, 1]})
+   (dependent.assign(A=lambda x: x['A'] + 1)
+             .assign(B=lambda x: x['A'] + 2))
 
 
 Indexing / Selection

From 4184732220b862ec610378c00863a12a0d080d94 Mon Sep 17 00:00:00 2001
From: Jeff Reback <jeff@reback.net>
Date: Sat, 10 Feb 2018 11:19:18 -0500
Subject: [PATCH 3/3] mod to docs

---
 doc/source/dsintro.rst | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst
index 4b4855c6693f2..78e2fdb46f659 100644
--- a/doc/source/dsintro.rst
+++ b/doc/source/dsintro.rst
@@ -522,16 +522,24 @@ to a column created earlier in the same :meth:`~DataFrame.assign`.
 In the second expression, ``x['C']`` will refer to the newly created column,
 that's equal to ``dfa['A'] + dfa['B']``.
 
+To write code compatible with all versions of Python, split the assignment in two.
+
+.. ipython:: python
+
+   dependent = pd.DataFrame({"A": [1, 1, 1]})
+   (dependent.assign(A=lambda x: x['A'] + 1)
+             .assign(B=lambda x: x['A'] + 2))
+
 .. warning::
 
    Dependent assignment maybe subtly change the behavior of your code between
    Python 3.6 and older versions of Python.
 
-If you wish write code that supports versions of python before and after 3.6,
-you'll need to take care when passing ``assign`` expressions that
+   If you wish write code that supports versions of python before and after 3.6,
+   you'll need to take care when passing ``assign`` expressions that
 
-   1. Updating an existing column
-   2. Refering to the newly updated column in the same ``assign``
+   * Updating an existing column
+   * Refering to the newly updated column in the same ``assign``
 
    For example, we'll update column "A" and then refer to it when creating "B".
 
@@ -561,13 +569,6 @@ you'll need to take care when passing ``assign`` expressions that
       1  2  4
       2  2  4
 
-To write code compatible with all versions of Python, split the assignment in two.
-
-.. ipython:: python
-
-   dependent = pd.DataFrame({"A": [1, 1, 1]})
-   (dependent.assign(A=lambda x: x['A'] + 1)
-             .assign(B=lambda x: x['A'] + 2))
 
 
 Indexing / Selection