@@ -95,7 +95,7 @@ constructed from the sorted keys of the dict, if possible.
95
95
96
96
NaN (not a number) is the standard missing data marker used in pandas.
97
97
98
- **From scalar value **
98
+ **From scalar value **
99
99
100
100
If ``data `` is a scalar value, an index must be
101
101
provided. The value will be repeated to match the length of **index **.
@@ -154,7 +154,7 @@ See also the :ref:`section on attribute access<indexing.attribute_access>`.
154
154
Vectorized operations and label alignment with Series
155
155
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
156
156
157
- When working with raw NumPy arrays, looping through value-by-value is usually
157
+ When working with raw NumPy arrays, looping through value-by-value is usually
158
158
not necessary. The same is true when working with Series in pandas.
159
159
Series can also be passed into most NumPy methods expecting an ndarray.
160
160
@@ -324,7 +324,7 @@ From a list of dicts
324
324
From a dict of tuples
325
325
~~~~~~~~~~~~~~~~~~~~~
326
326
327
- You can automatically create a multi-indexed frame by passing a tuples
327
+ You can automatically create a multi-indexed frame by passing a tuples
328
328
dictionary.
329
329
330
330
.. ipython :: python
@@ -347,7 +347,7 @@ column name provided).
347
347
**Missing Data **
348
348
349
349
Much more will be said on this topic in the :ref: `Missing data <missing_data >`
350
- section. To construct a DataFrame with missing data, we use ``np.nan `` to
350
+ section. To construct a DataFrame with missing data, we use ``np.nan `` to
351
351
represent missing values. Alternatively, you may pass a ``numpy.MaskedArray ``
352
352
as the data argument to the DataFrame constructor, and its masked entries will
353
353
be considered missing.
@@ -370,7 +370,7 @@ set to ``'index'`` in order to use the dict keys as row labels.
370
370
371
371
``DataFrame.from_records `` takes a list of tuples or an ndarray with structured
372
372
dtype. It works analogously to the normal ``DataFrame `` constructor, except that
373
- the resulting DataFrame index may be a specific field of the structured
373
+ the resulting DataFrame index may be a specific field of the structured
374
374
dtype. For example:
375
375
376
376
.. ipython :: python
@@ -506,25 +506,70 @@ to be inserted (for example, a ``Series`` or NumPy array), or a function
506
506
of one argument to be called on the ``DataFrame ``. A *copy * of the original
507
507
DataFrame is returned, with the new values inserted.
508
508
509
+ .. versionmodified :: 0.23.0
510
+
511
+ Starting with Python 3.6 the order of ``**kwargs `` is preserved. This allows
512
+ for *dependent * assignment, where an expression later in ``**kwargs `` can refer
513
+ to a column created earlier in the same :meth: `~DataFrame.assign `.
514
+
515
+ .. ipython :: python
516
+
517
+ dfa = pd.DataFrame({" A" : [1 , 2 , 3 ],
518
+ " B" : [4 , 5 , 6 ]})
519
+ dfa.assign(C = lambda x : x[' A' ] + x[' B' ],
520
+ D = lambda x : x[' A' ] + x[' C' ])
521
+
522
+ In the second expression, ``x['C'] `` will refer to the newly created column,
523
+ that's equal to ``dfa['A'] + dfa['B'] ``.
524
+
525
+ To write code compatible with all versions of Python, split the assignment in two.
526
+
527
+ .. ipython :: python
528
+
529
+ dependent = pd.DataFrame({" A" : [1 , 1 , 1 ]})
530
+ (dependent.assign(A = lambda x : x[' A' ] + 1 )
531
+ .assign(B = lambda x : x[' A' ] + 2 ))
532
+
509
533
.. warning ::
510
534
511
- Since the function signature of ``assign `` is ``**kwargs ``, a dictionary,
512
- the order of the new columns in the resulting DataFrame cannot be guaranteed
513
- to match the order you pass in. To make things predictable, items are inserted
514
- alphabetically (by key) at the end of the DataFrame.
535
+ Dependent assignment maybe subtly change the behavior of your code between
536
+ Python 3.6 and older versions of Python.
537
+
538
+ If you wish write code that supports versions of python before and after 3.6,
539
+ you'll need to take care when passing ``assign `` expressions that
540
+
541
+ * Updating an existing column
542
+ * Refering to the newly updated column in the same ``assign ``
543
+
544
+ For example, we'll update column "A" and then refer to it when creating "B".
545
+
546
+ .. code-block :: python
547
+
548
+ >> > dependent = pd.DataFrame({" A" : [1 , 1 , 1 ]})
549
+ >> > dependent.assign(A = lambda x : x[" A" ] + 1 ,
550
+ B = lambda x : x[" A" ] + 2 )
551
+
552
+ For Python 3.5 and earlier the expression creating ``B `` refers to the
553
+ "old" value of ``A ``, ``[1, 1, 1] ``. The output is then
554
+
555
+ .. code-block :: python
556
+
557
+ A B
558
+ 0 2 3
559
+ 1 2 3
560
+ 2 2 3
561
+
562
+ For Python 3.6 and later, the expression creating ``A `` refers to the
563
+ "new" value of ``A ``, ``[2, 2, 2] ``, which results in
564
+
565
+ .. code-block :: python
515
566
516
- All expressions are computed first, and then assigned. So you can't refer
517
- to another column being assigned in the same call to ``assign ``. For example:
567
+ A B
568
+ 0 2 4
569
+ 1 2 4
570
+ 2 2 4
518
571
519
- .. ipython ::
520
- :verbatim:
521
572
522
- In [1]: # Don't do this, bad reference to `C `
523
- df.assign(C = lambda x: x['A'] + x['B'],
524
- D = lambda x: x['A'] + x['C'])
525
- In [2]: # Instead, break it into two assigns
526
- (df.assign(C = lambda x: x['A'] + x['B'])
527
- .assign(D = lambda x: x['A'] + x['C']))
528
573
529
574
Indexing / Selection
530
575
~~~~~~~~~~~~~~~~~~~~
@@ -914,7 +959,7 @@ For example, using the earlier example data, we could do:
914
959
Squeezing
915
960
~~~~~~~~~
916
961
917
- Another way to change the dimensionality of an object is to ``squeeze `` a 1-len
962
+ Another way to change the dimensionality of an object is to ``squeeze `` a 1-len
918
963
object, similar to ``wp['Item1'] ``.
919
964
920
965
.. ipython :: python
0 commit comments