@@ -31,10 +31,10 @@ operations.
31
31
Concatenating objects
32
32
---------------------
33
33
34
- The :func: `~pandas.concat ` function (in the main pandas namespace) does all of
35
- the heavy lifting of performing concatenation operations along an axis while
36
- performing optional set logic (union or intersection) of the indexes (if any) on
37
- the other axes. Note that I say "if any" because there is only a single possible
34
+ The :func: `~pandas.concat ` function (in the main pandas namespace) does all of
35
+ the heavy lifting of performing concatenation operations along an axis while
36
+ performing optional set logic (union or intersection) of the indexes (if any) on
37
+ the other axes. Note that I say "if any" because there is only a single possible
38
38
axis of concatenation for Series.
39
39
40
40
Before diving into all of the details of ``concat `` and what it can do, here is
@@ -109,9 +109,9 @@ some configurable handling of "what to do with the other axes":
109
109
to the actual data concatenation.
110
110
* ``copy `` : boolean, default True. If False, do not copy data unnecessarily.
111
111
112
- Without a little bit of context many of these arguments don't make much sense.
113
- Let's revisit the above example. Suppose we wanted to associate specific keys
114
- with each of the pieces of the chopped up DataFrame. We can do this using the
112
+ Without a little bit of context many of these arguments don't make much sense.
113
+ Let's revisit the above example. Suppose we wanted to associate specific keys
114
+ with each of the pieces of the chopped up DataFrame. We can do this using the
115
115
``keys `` argument:
116
116
117
117
.. ipython :: python
@@ -138,9 +138,9 @@ It's not a stretch to see how this can be very useful. More detail on this
138
138
functionality below.
139
139
140
140
.. note ::
141
- It is worth noting that :func: `~pandas.concat ` (and therefore
142
- :func: `~pandas.append `) makes a full copy of the data, and that constantly
143
- reusing this function can create a significant performance hit. If you need
141
+ It is worth noting that :func: `~pandas.concat ` (and therefore
142
+ :func: `~pandas.append `) makes a full copy of the data, and that constantly
143
+ reusing this function can create a significant performance hit. If you need
144
144
to use the operation over several datasets, use a list comprehension.
145
145
146
146
::
@@ -224,8 +224,8 @@ DataFrame:
224
224
Concatenating using ``append ``
225
225
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
226
226
227
- A useful shortcut to :func: `~pandas.concat ` are the :meth: `~DataFrame.append `
228
- instance methods on ``Series `` and ``DataFrame ``. These methods actually predated
227
+ A useful shortcut to :func: `~pandas.concat ` are the :meth: `~DataFrame.append `
228
+ instance methods on ``Series `` and ``DataFrame ``. These methods actually predated
229
229
``concat ``. They concatenate along ``axis=0 ``, namely the index:
230
230
231
231
.. ipython :: python
@@ -271,8 +271,8 @@ need to be:
271
271
272
272
.. note ::
273
273
274
- Unlike the :py:meth: `~list.append ` method, which appends to the original list
275
- and returns ``None ``, :meth: `~DataFrame.append ` here **does not ** modify
274
+ Unlike the :py:meth: `~list.append ` method, which appends to the original list
275
+ and returns ``None ``, :meth: `~DataFrame.append ` here **does not ** modify
276
276
``df1 `` and returns its copy with ``df2 `` appended.
277
277
278
278
.. _merging.ignore_index :
@@ -370,9 +370,9 @@ Passing ``ignore_index=True`` will drop all name references.
370
370
More concatenating with group keys
371
371
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
372
372
373
- A fairly common use of the ``keys `` argument is to override the column names
373
+ A fairly common use of the ``keys `` argument is to override the column names
374
374
when creating a new ``DataFrame `` based on existing ``Series ``.
375
- Notice how the default behaviour consists on letting the resulting ``DataFrame ``
375
+ Notice how the default behaviour consists on letting the resulting ``DataFrame ``
376
376
inherit the parent ``Series ``' name, when these existed.
377
377
378
378
.. ipython :: python
@@ -468,7 +468,7 @@ Appending rows to a DataFrame
468
468
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
469
469
470
470
While not especially efficient (since a new object must be created), you can
471
- append a single row to a ``DataFrame `` by passing a ``Series `` or dict to
471
+ append a single row to a ``DataFrame `` by passing a ``Series `` or dict to
472
472
``append ``, which returns a new ``DataFrame `` as above.
473
473
474
474
.. ipython :: python
@@ -513,15 +513,15 @@ pandas has full-featured, **high performance** in-memory join operations
513
513
idiomatically very similar to relational databases like SQL. These methods
514
514
perform significantly better (in some cases well over an order of magnitude
515
515
better) than other open source implementations (like ``base::merge.data.frame ``
516
- in R). The reason for this is careful algorithmic design and the internal layout
516
+ in R). The reason for this is careful algorithmic design and the internal layout
517
517
of the data in ``DataFrame ``.
518
518
519
519
See the :ref: `cookbook<cookbook.merge> ` for some advanced strategies.
520
520
521
521
Users who are familiar with SQL but new to pandas might be interested in a
522
522
:ref: `comparison with SQL<compare_with_sql.join> `.
523
523
524
- pandas provides a single function, :func: `~pandas.merge `, as the entry point for
524
+ pandas provides a single function, :func: `~pandas.merge `, as the entry point for
525
525
all standard database join operations between ``DataFrame `` or named ``Series `` objects:
526
526
527
527
::
@@ -590,7 +590,7 @@ The return type will be the same as ``left``. If ``left`` is a ``DataFrame`` or
590
590
and ``right `` is a subclass of ``DataFrame ``, the return type will still be ``DataFrame ``.
591
591
592
592
``merge `` is a function in the pandas namespace, and it is also available as a
593
- ``DataFrame `` instance method :meth: `~DataFrame.merge `, with the calling
593
+ ``DataFrame `` instance method :meth: `~DataFrame.merge `, with the calling
594
594
``DataFrame `` being implicitly considered the left object in the join.
595
595
596
596
The related :meth: `~DataFrame.join ` method, uses ``merge `` internally for the
@@ -602,7 +602,7 @@ Brief primer on merge methods (relational algebra)
602
602
603
603
Experienced users of relational databases like SQL will be familiar with the
604
604
terminology used to describe join operations between two SQL-table like
605
- structures (``DataFrame `` objects). There are several cases to consider which
605
+ structures (``DataFrame `` objects). There are several cases to consider which
606
606
are very important to understand:
607
607
608
608
* **one-to-one ** joins: for example when joining two ``DataFrame `` objects on
@@ -642,8 +642,8 @@ key combination:
642
642
labels = [' left' , ' right' ], vertical = False );
643
643
plt.close(' all' );
644
644
645
- Here is a more complicated example with multiple join keys. Only the keys
646
- appearing in ``left `` and ``right `` are present (the intersection), since
645
+ Here is a more complicated example with multiple join keys. Only the keys
646
+ appearing in ``left `` and ``right `` are present (the intersection), since
647
647
``how='inner' `` by default.
648
648
649
649
.. ipython :: python
@@ -759,13 +759,13 @@ Checking for duplicate keys
759
759
760
760
.. versionadded :: 0.21.0
761
761
762
- Users can use the ``validate `` argument to automatically check whether there
763
- are unexpected duplicates in their merge keys. Key uniqueness is checked before
764
- merge operations and so should protect against memory overflows. Checking key
765
- uniqueness is also a good way to ensure user data structures are as expected.
762
+ Users can use the ``validate `` argument to automatically check whether there
763
+ are unexpected duplicates in their merge keys. Key uniqueness is checked before
764
+ merge operations and so should protect against memory overflows. Checking key
765
+ uniqueness is also a good way to ensure user data structures are as expected.
766
766
767
- In the following example, there are duplicate values of ``B `` in the right
768
- ``DataFrame ``. As this is not a one-to-one merge -- as specified in the
767
+ In the following example, there are duplicate values of ``B `` in the right
768
+ ``DataFrame ``. As this is not a one-to-one merge -- as specified in the
769
769
``validate `` argument -- an exception will be raised.
770
770
771
771
@@ -778,11 +778,11 @@ In the following example, there are duplicate values of ``B`` in the right
778
778
779
779
In [53]: result = pd.merge(left, right, on='B', how='outer', validate="one_to_one")
780
780
...
781
- MergeError: Merge keys are not unique in right dataset; not a one-to-one merge
781
+ MergeError: Merge keys are not unique in right dataset; not a one-to-one merge
782
782
783
- If the user is aware of the duplicates in the right ``DataFrame `` but wants to
784
- ensure there are no duplicates in the left DataFrame, one can use the
785
- ``validate='one_to_many' `` argument instead, which will not raise an exception.
783
+ If the user is aware of the duplicates in the right ``DataFrame `` but wants to
784
+ ensure there are no duplicates in the left DataFrame, one can use the
785
+ ``validate='one_to_many' `` argument instead, which will not raise an exception.
786
786
787
787
.. ipython :: python
788
788
@@ -794,8 +794,8 @@ ensure there are no duplicates in the left DataFrame, one can use the
794
794
The merge indicator
795
795
~~~~~~~~~~~~~~~~~~~
796
796
797
- :func: `~pandas.merge ` accepts the argument ``indicator ``. If ``True ``, a
798
- Categorical-type column called ``_merge `` will be added to the output object
797
+ :func: `~pandas.merge ` accepts the argument ``indicator ``. If ``True ``, a
798
+ Categorical-type column called ``_merge `` will be added to the output object
799
799
that takes on values:
800
800
801
801
=================================== ================
@@ -903,7 +903,7 @@ Joining on index
903
903
~~~~~~~~~~~~~~~~
904
904
905
905
:meth: `DataFrame.join ` is a convenient method for combining the columns of two
906
- potentially differently-indexed ``DataFrames `` into a single result
906
+ potentially differently-indexed ``DataFrames `` into a single result
907
907
``DataFrame ``. Here is a very basic example:
908
908
909
909
.. ipython :: python
@@ -983,9 +983,9 @@ indexes:
983
983
Joining key columns on an index
984
984
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
985
985
986
- :meth: `~DataFrame.join ` takes an optional ``on `` argument which may be a column
986
+ :meth: `~DataFrame.join ` takes an optional ``on `` argument which may be a column
987
987
or multiple column names, which specifies that the passed ``DataFrame `` is to be
988
- aligned on that column in the ``DataFrame ``. These two function calls are
988
+ aligned on that column in the ``DataFrame ``. These two function calls are
989
989
completely equivalent:
990
990
991
991
::
@@ -995,7 +995,7 @@ completely equivalent:
995
995
how='left', sort=False)
996
996
997
997
Obviously you can choose whichever form you find more convenient. For
998
- many-to-one joins (where one of the ``DataFrame ``'s is already indexed by the
998
+ many-to-one joins (where one of the ``DataFrame ``'s is already indexed by the
999
999
join key), using ``join `` may be more convenient. Here is a simple example:
1000
1000
1001
1001
.. ipython :: python
@@ -1133,17 +1133,42 @@ This is equivalent but less verbose and more memory efficient / faster than this
1133
1133
Joining with two MultiIndexes
1134
1134
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1135
1135
1136
- This is not implemented via ``join `` at-the-moment, however it can be done using
1137
- the following code.
1136
+ This is supported in a limited way, provided that the index for the right
1137
+ argument is completely used in the join, and is a subset of the indices in
1138
+ the left argument, as in this example:
1138
1139
1139
1140
.. ipython :: python
1140
1141
1141
- index = pd.MultiIndex.from_tuples([(' K0' , ' X0' ), (' K0' , ' X1' ),
1142
- (' K1' , ' X2' )],
1143
- names = [' key' , ' X' ])
1142
+ leftindex = pd.MultiIndex.from_product([list (' abc' ), list (' xy' ), [1 , 2 ]],
1143
+ names = [' abc' , ' xy' , ' num' ])
1144
+ left = pd.DataFrame({' v1' : range (12 )}, index = leftindex)
1145
+ left
1146
+
1147
+ rightindex = pd.MultiIndex.from_product([list (' abc' ), list (' xy' )],
1148
+ names = [' abc' , ' xy' ])
1149
+ right = pd.DataFrame({' v2' : [100 * i for i in range (1 , 7 )]}, index = rightindex)
1150
+ right
1151
+
1152
+ left.join(right, on = [' abc' , ' xy' ], how = ' inner' )
1153
+
1154
+ If that condition is not satisfied, a join with two multi-indexes can be
1155
+ done using the following code.
1156
+
1157
+ .. ipython :: python
1158
+
1159
+ leftindex = pd.MultiIndex.from_tuples([(' K0' , ' X0' ), (' K0' , ' X1' ),
1160
+ (' K1' , ' X2' )],
1161
+ names = [' key' , ' X' ])
1144
1162
left = pd.DataFrame({' A' : [' A0' , ' A1' , ' A2' ],
1145
1163
' B' : [' B0' , ' B1' , ' B2' ]},
1146
- index = index)
1164
+ index = leftindex)
1165
+
1166
+ rightindex = pd.MultiIndex.from_tuples([(' K0' , ' Y0' ), (' K1' , ' Y1' ),
1167
+ (' K2' , ' Y2' ), (' K2' , ' Y3' )],
1168
+ names = [' key' , ' Y' ])
1169
+ right = pd.DataFrame({' C' : [' C0' , ' C1' , ' C2' , ' C3' ],
1170
+ ' D' : [' D0' , ' D1' , ' D2' , ' D3' ]},
1171
+ index = rightindex)
1147
1172
1148
1173
result = pd.merge(left.reset_index(), right.reset_index(),
1149
1174
on = [' key' ], how = ' inner' ).set_index([' key' ,' X' ,' Y' ])
@@ -1161,7 +1186,7 @@ the following code.
1161
1186
Merging on a combination of columns and index levels
1162
1187
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1163
1188
1164
- .. versionadded :: 0.22
1189
+ .. versionadded :: 0.23
1165
1190
1166
1191
Strings passed as the ``on ``, ``left_on ``, and ``right_on `` parameters
1167
1192
may refer to either column names or index level names. This enables merging
@@ -1200,6 +1225,12 @@ resetting indexes.
1200
1225
frames, the index level is preserved as an index level in the resulting
1201
1226
DataFrame.
1202
1227
1228
+ .. note ::
1229
+ When DataFrames are merged using only some of the levels of a `MultiIndex `,
1230
+ the extra levels will be dropped from the resulting merge. In order to
1231
+ preserve those levels, use ``reset_index `` on those level names to move
1232
+ those levels to columns prior to doing the merge.
1233
+
1203
1234
.. note ::
1204
1235
1205
1236
If a string matches both a column name and an index level name, then a
@@ -1262,7 +1293,7 @@ similarly.
1262
1293
Joining multiple DataFrame or Panel objects
1263
1294
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1264
1295
1265
- A list or tuple of ``DataFrames `` can also be passed to :meth: `~DataFrame.join `
1296
+ A list or tuple of ``DataFrames `` can also be passed to :meth: `~DataFrame.join `
1266
1297
to join them together on their indexes.
1267
1298
1268
1299
.. ipython :: python
@@ -1284,7 +1315,7 @@ Merging together values within Series or DataFrame columns
1284
1315
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1285
1316
1286
1317
Another fairly common situation is to have two like-indexed (or similarly
1287
- indexed) ``Series `` or ``DataFrame `` objects and wanting to "patch" values in
1318
+ indexed) ``Series `` or ``DataFrame `` objects and wanting to "patch" values in
1288
1319
one object from values for matching indices in the other. Here is an example:
1289
1320
1290
1321
.. ipython :: python
@@ -1309,7 +1340,7 @@ For this, use the :meth:`~DataFrame.combine_first` method:
1309
1340
plt.close(' all' );
1310
1341
1311
1342
Note that this method only takes values from the right ``DataFrame `` if they are
1312
- missing in the left ``DataFrame ``. A related method, :meth: `~DataFrame.update `,
1343
+ missing in the left ``DataFrame ``. A related method, :meth: `~DataFrame.update `,
1313
1344
alters non-NA values in place:
1314
1345
1315
1346
.. ipython :: python
@@ -1361,15 +1392,15 @@ Merging AsOf
1361
1392
1362
1393
.. versionadded :: 0.19.0
1363
1394
1364
- A :func: `merge_asof ` is similar to an ordered left-join except that we match on
1365
- nearest key rather than equal keys. For each row in the ``left `` ``DataFrame ``,
1366
- we select the last row in the ``right `` ``DataFrame `` whose ``on `` key is less
1395
+ A :func: `merge_asof ` is similar to an ordered left-join except that we match on
1396
+ nearest key rather than equal keys. For each row in the ``left `` ``DataFrame ``,
1397
+ we select the last row in the ``right `` ``DataFrame `` whose ``on `` key is less
1367
1398
than the left's key. Both DataFrames must be sorted by the key.
1368
1399
1369
- Optionally an asof merge can perform a group-wise merge. This matches the
1400
+ Optionally an asof merge can perform a group-wise merge. This matches the
1370
1401
``by `` key equally, in addition to the nearest match on the ``on `` key.
1371
1402
1372
- For example; we might have ``trades `` and ``quotes `` and we want to ``asof ``
1403
+ For example; we might have ``trades `` and ``quotes `` and we want to ``asof ``
1373
1404
merge them.
1374
1405
1375
1406
.. ipython :: python
@@ -1428,8 +1459,8 @@ We only asof within ``2ms`` between the quote time and the trade time.
1428
1459
by = ' ticker' ,
1429
1460
tolerance = pd.Timedelta(' 2ms' ))
1430
1461
1431
- We only asof within ``10ms `` between the quote time and the trade time and we
1432
- exclude exact matches on time. Note that though we exclude the exact matches
1462
+ We only asof within ``10ms `` between the quote time and the trade time and we
1463
+ exclude exact matches on time. Note that though we exclude the exact matches
1433
1464
(of the quotes), prior quotes **do ** propagate to that point in time.
1434
1465
1435
1466
.. ipython :: python
0 commit comments