@@ -31,10 +31,17 @@ operations.
31
31
Concatenating objects
32
32
---------------------
33
33
34
+ <<<<<<< HEAD
35
+ The :func: `~pandas.concat ` function (in the main pandas namespace) does all of
36
+ the heavy lifting of performing concatenation operations along an axis while
37
+ performing optional set logic (union or intersection) of the indexes (if any) on
38
+ the other axes. Note that I say "if any" because there is only a single possible
39
+ =======
34
40
The :func: `~pandas.concat ` function (in the main pandas namespace) does all of
35
41
the heavy lifting of performing concatenation operations along an axis while
36
42
performing optional set logic (union or intersection) of the indexes (if any) on
37
43
the other axes. Note that I say "if any" because there is only a single possible
44
+ >>>>>>> remotes/upstream/master
38
45
axis of concatenation for Series.
39
46
40
47
Before diving into all of the details of ``concat `` and what it can do, here is
@@ -109,9 +116,9 @@ some configurable handling of "what to do with the other axes":
109
116
to the actual data concatenation.
110
117
- ``copy `` : boolean, default True. If False, do not copy data unnecessarily.
111
118
112
- Without a little bit of context many of these arguments don't make much sense.
113
- Let's revisit the above example. Suppose we wanted to associate specific keys
114
- with each of the pieces of the chopped up DataFrame. We can do this using the
119
+ Without a little bit of context many of these arguments don't make much sense.
120
+ Let's revisit the above example. Suppose we wanted to associate specific keys
121
+ with each of the pieces of the chopped up DataFrame. We can do this using the
115
122
``keys `` argument:
116
123
117
124
.. ipython :: python
@@ -138,9 +145,9 @@ It's not a stretch to see how this can be very useful. More detail on this
138
145
functionality below.
139
146
140
147
.. note ::
141
- It is worth noting that :func: `~pandas.concat ` (and therefore
142
- :func: `~pandas.append `) makes a full copy of the data, and that constantly
143
- reusing this function can create a significant performance hit. If you need
148
+ It is worth noting that :func: `~pandas.concat ` (and therefore
149
+ :func: `~pandas.append `) makes a full copy of the data, and that constantly
150
+ reusing this function can create a significant performance hit. If you need
144
151
to use the operation over several datasets, use a list comprehension.
145
152
146
153
::
@@ -153,7 +160,7 @@ Set logic on the other axes
153
160
~~~~~~~~~~~~~~~~~~~~~~~~~~~
154
161
155
162
When gluing together multiple DataFrames, you have a choice of how to handle
156
- the other axes (other than the one being concatenated). This can be done in
163
+ the other axes (other than the one being concatenated). This can be done in
157
164
the following three ways:
158
165
159
166
- Take the (sorted) union of them all, ``join='outer' ``. This is the default
@@ -216,8 +223,8 @@ DataFrame:
216
223
Concatenating using ``append ``
217
224
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
218
225
219
- A useful shortcut to :func: `~pandas.concat ` are the :meth: `~DataFrame.append `
220
- instance methods on ``Series `` and ``DataFrame ``. These methods actually predated
226
+ A useful shortcut to :func: `~pandas.concat ` are the :meth: `~DataFrame.append `
227
+ instance methods on ``Series `` and ``DataFrame ``. These methods actually predated
221
228
``concat ``. They concatenate along ``axis=0 ``, namely the index:
222
229
223
230
.. ipython :: python
@@ -263,8 +270,8 @@ need to be:
263
270
264
271
.. note ::
265
272
266
- Unlike the :py:meth: `~list.append ` method, which appends to the original list
267
- and returns ``None ``, :meth: `~DataFrame.append ` here **does not ** modify
273
+ Unlike the :py:meth: `~list.append ` method, which appends to the original list
274
+ and returns ``None ``, :meth: `~DataFrame.append ` here **does not ** modify
268
275
``df1 `` and returns its copy with ``df2 `` appended.
269
276
270
277
.. _merging.ignore_index :
@@ -362,9 +369,9 @@ Passing ``ignore_index=True`` will drop all name references.
362
369
More concatenating with group keys
363
370
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
364
371
365
- A fairly common use of the ``keys `` argument is to override the column names
372
+ A fairly common use of the ``keys `` argument is to override the column names
366
373
when creating a new ``DataFrame `` based on existing ``Series ``.
367
- Notice how the default behaviour consists on letting the resulting ``DataFrame ``
374
+ Notice how the default behaviour consists on letting the resulting ``DataFrame ``
368
375
inherit the parent ``Series ``' name, when these existed.
369
376
370
377
.. ipython :: python
@@ -460,7 +467,7 @@ Appending rows to a DataFrame
460
467
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
461
468
462
469
While not especially efficient (since a new object must be created), you can
463
- append a single row to a ``DataFrame `` by passing a ``Series `` or dict to
470
+ append a single row to a ``DataFrame `` by passing a ``Series `` or dict to
464
471
``append ``, which returns a new ``DataFrame `` as above.
465
472
466
473
.. ipython :: python
@@ -505,15 +512,15 @@ pandas has full-featured, **high performance** in-memory join operations
505
512
idiomatically very similar to relational databases like SQL. These methods
506
513
perform significantly better (in some cases well over an order of magnitude
507
514
better) than other open source implementations (like ``base::merge.data.frame ``
508
- in R). The reason for this is careful algorithmic design and the internal layout
515
+ in R). The reason for this is careful algorithmic design and the internal layout
509
516
of the data in ``DataFrame ``.
510
517
511
518
See the :ref: `cookbook<cookbook.merge> ` for some advanced strategies.
512
519
513
520
Users who are familiar with SQL but new to pandas might be interested in a
514
521
:ref: `comparison with SQL<compare_with_sql.join> `.
515
522
516
- pandas provides a single function, :func: `~pandas.merge `, as the entry point for
523
+ pandas provides a single function, :func: `~pandas.merge `, as the entry point for
517
524
all standard database join operations between ``DataFrame `` objects:
518
525
519
526
::
@@ -582,7 +589,11 @@ and ``right`` is a subclass of DataFrame, the return type will still be
582
589
``DataFrame ``.
583
590
584
591
``merge `` is a function in the pandas namespace, and it is also available as a
592
+ <<<<<<< HEAD
593
+ ``DataFrame `` instance method :meth: `~DataFrame.merge `, with the calling
594
+ =======
585
595
``DataFrame `` instance method :meth: `~DataFrame.merge `, with the calling
596
+ >>>>>>> remotes/upstream/master
586
597
``DataFrame `` being implicitly considered the left object in the join.
587
598
588
599
The related :meth:`~DataFrame.join` method, uses ``merge `` internally for the
@@ -594,7 +605,7 @@ Brief primer on merge methods (relational algebra)
594
605
595
606
Experienced users of relational databases like SQL will be familiar with the
596
607
terminology used to describe join operations between two SQL-table like
597
- structures (``DataFrame `` objects). There are several cases to consider which
608
+ structures (``DataFrame `` objects). There are several cases to consider which
598
609
are very important to understand:
599
610
600
611
- **one-to-one ** joins: for example when joining two ``DataFrame `` objects on
@@ -634,8 +645,8 @@ key combination:
634
645
labels = [' left' , ' right' ], vertical = False );
635
646
plt.close(' all' );
636
647
637
- Here is a more complicated example with multiple join keys. Only the keys
638
- appearing in ``left `` and ``right `` are present (the intersection), since
648
+ Here is a more complicated example with multiple join keys. Only the keys
649
+ appearing in ``left `` and ``right `` are present (the intersection), since
639
650
``how='inner' `` by default.
640
651
641
652
.. ipython :: python
@@ -751,13 +762,13 @@ Checking for duplicate keys
751
762
752
763
.. versionadded :: 0.21.0
753
764
754
- Users can use the ``validate `` argument to automatically check whether there
755
- are unexpected duplicates in their merge keys. Key uniqueness is checked before
756
- merge operations and so should protect against memory overflows. Checking key
757
- uniqueness is also a good way to ensure user data structures are as expected.
765
+ Users can use the ``validate `` argument to automatically check whether there
766
+ are unexpected duplicates in their merge keys. Key uniqueness is checked before
767
+ merge operations and so should protect against memory overflows. Checking key
768
+ uniqueness is also a good way to ensure user data structures are as expected.
758
769
759
- In the following example, there are duplicate values of ``B `` in the right
760
- ``DataFrame ``. As this is not a one-to-one merge -- as specified in the
770
+ In the following example, there are duplicate values of ``B `` in the right
771
+ ``DataFrame ``. As this is not a one-to-one merge -- as specified in the
761
772
``validate `` argument -- an exception will be raised.
762
773
763
774
@@ -770,11 +781,11 @@ In the following example, there are duplicate values of ``B`` in the right
770
781
771
782
In [53]: result = pd.merge(left, right, on='B', how='outer', validate="one_to_one")
772
783
...
773
- MergeError: Merge keys are not unique in right dataset; not a one-to-one merge
784
+ MergeError: Merge keys are not unique in right dataset; not a one-to-one merge
774
785
775
- If the user is aware of the duplicates in the right ``DataFrame `` but wants to
776
- ensure there are no duplicates in the left DataFrame, one can use the
777
- ``validate='one_to_many' `` argument instead, which will not raise an exception.
786
+ If the user is aware of the duplicates in the right ``DataFrame `` but wants to
787
+ ensure there are no duplicates in the left DataFrame, one can use the
788
+ ``validate='one_to_many' `` argument instead, which will not raise an exception.
778
789
779
790
.. ipython :: python
780
791
@@ -786,8 +797,8 @@ ensure there are no duplicates in the left DataFrame, one can use the
786
797
The merge indicator
787
798
~~~~~~~~~~~~~~~~~~~
788
799
789
- :func: `~pandas.merge ` accepts the argument ``indicator ``. If ``True ``, a
790
- Categorical-type column called ``_merge `` will be added to the output object
800
+ :func: `~pandas.merge ` accepts the argument ``indicator ``. If ``True ``, a
801
+ Categorical-type column called ``_merge `` will be added to the output object
791
802
that takes on values:
792
803
793
804
=================================== ================
@@ -895,7 +906,7 @@ Joining on index
895
906
~~~~~~~~~~~~~~~~
896
907
897
908
:meth: `DataFrame.join ` is a convenient method for combining the columns of two
898
- potentially differently-indexed ``DataFrames `` into a single result
909
+ potentially differently-indexed ``DataFrames `` into a single result
899
910
``DataFrame ``. Here is a very basic example:
900
911
901
912
.. ipython :: python
@@ -975,9 +986,15 @@ indexes:
975
986
Joining key columns on an index
976
987
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
977
988
989
+ <<<<<<< HEAD
990
+ :meth: `~DataFrame.join ` takes an optional ``on `` argument which may be a column
991
+ or multiple column names, which specifies that the passed ``DataFrame `` is to be
992
+ aligned on that column in the ``DataFrame ``. These two function calls are
993
+ =======
978
994
:meth: `~DataFrame.join ` takes an optional ``on `` argument which may be a column
979
995
or multiple column names, which specifies that the passed ``DataFrame `` is to be
980
996
aligned on that column in the ``DataFrame ``. These two function calls are
997
+ >>>>>>> remotes/upstream/master
981
998
completely equivalent:
982
999
983
1000
::
@@ -987,7 +1004,11 @@ completely equivalent:
987
1004
how='left', sort=False)
988
1005
989
1006
Obviously you can choose whichever form you find more convenient. For
1007
+ <<<<<<< HEAD
1008
+ many-to-one joins (where one of the ``DataFrame ``'s is already indexed by the
1009
+ =======
990
1010
many-to-one joins (where one of the ``DataFrame ``'s is already indexed by the
1011
+ >>>>>>> remotes/upstream/master
991
1012
join key), using ``join `` may be more convenient. Here is a simple example:
992
1013
993
1014
.. ipython :: python
@@ -1125,20 +1146,25 @@ This is equivalent but less verbose and more memory efficient / faster than this
1125
1146
Joining with two multi-indexes
1126
1147
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1127
1148
1128
- This is not implemented via ``join `` at-the-moment, however it can be done using
1129
- the following code.
1149
+ As of Pandas 0.23.1 the :func: `Dataframe.join ` can be used to join multi-indexed ``Dataframe `` instances on the overlaping index levels
1130
1150
1131
1151
.. ipython :: python
1132
1152
1133
- index = pd.MultiIndex.from_tuples([(' K0' , ' X0' ), (' K0' , ' X1' ),
1153
+ index_left = pd.MultiIndex.from_tuples([(' K0' , ' X0' ), (' K0' , ' X1' ),
1134
1154
(' K1' , ' X2' )],
1135
1155
names = [' key' , ' X' ])
1136
1156
left = pd.DataFrame({' A' : [' A0' , ' A1' , ' A2' ],
1137
1157
' B' : [' B0' , ' B1' , ' B2' ]},
1138
- index = index)
1158
+ index = index_left)
1159
+
1160
+ index_right = pd.MultiIndex.from_tuples([(' K0' , ' Y0' ), (' K1' , ' Y1' ),
1161
+ (' K2' , ' Y2' ), (' K2' , ' Y3' )],
1162
+ names = [' key' , ' Y' ])
1163
+ right = pd.DataFrame({' C' : [' C0' , ' C1' , ' C2' , ' C3' ],
1164
+ ' D' : [' D0' , ' D1' , ' D2' , ' D3' ]},
1165
+ index = index_right)
1139
1166
1140
- result = pd.merge(left.reset_index(), right.reset_index(),
1141
- on = [' key' ], how = ' inner' ).set_index([' key' ,' X' ,' Y' ])
1167
+ left.join(right)
1142
1168
1143
1169
.. ipython :: python
1144
1170
:suppress:
@@ -1148,6 +1174,13 @@ the following code.
1148
1174
labels = [' left' , ' right' ], vertical = False );
1149
1175
plt.close(' all' );
1150
1176
1177
+ For earlier versions it can be done using the following.
1178
+
1179
+ .. ipython :: python
1180
+
1181
+ pd.merge(left.reset_index(), right.reset_index(),
1182
+ on = [' key' ], how = ' inner' ).set_index([' key' ,' X' ,' Y' ])
1183
+
1151
1184
.. _merging.merge_on_columns_and_levels :
1152
1185
1153
1186
Merging on a combination of columns and index levels
@@ -1254,7 +1287,7 @@ similarly.
1254
1287
Joining multiple DataFrame or Panel objects
1255
1288
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1256
1289
1257
- A list or tuple of ``DataFrames `` can also be passed to :meth: `~DataFrame.join `
1290
+ A list or tuple of ``DataFrames `` can also be passed to :meth: `~DataFrame.join `
1258
1291
to join them together on their indexes.
1259
1292
1260
1293
.. ipython :: python
@@ -1276,7 +1309,7 @@ Merging together values within Series or DataFrame columns
1276
1309
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1277
1310
1278
1311
Another fairly common situation is to have two like-indexed (or similarly
1279
- indexed) ``Series `` or ``DataFrame `` objects and wanting to "patch" values in
1312
+ indexed) ``Series `` or ``DataFrame `` objects and wanting to "patch" values in
1280
1313
one object from values for matching indices in the other. Here is an example:
1281
1314
1282
1315
.. ipython :: python
@@ -1301,7 +1334,7 @@ For this, use the :meth:`~DataFrame.combine_first` method:
1301
1334
plt.close(' all' );
1302
1335
1303
1336
Note that this method only takes values from the right ``DataFrame `` if they are
1304
- missing in the left ``DataFrame ``. A related method, :meth: `~DataFrame.update `,
1337
+ missing in the left ``DataFrame ``. A related method, :meth: `~DataFrame.update `,
1305
1338
alters non-NA values inplace:
1306
1339
1307
1340
.. ipython :: python
@@ -1353,15 +1386,15 @@ Merging AsOf
1353
1386
1354
1387
.. versionadded :: 0.19.0
1355
1388
1356
- A :func: `merge_asof ` is similar to an ordered left-join except that we match on
1357
- nearest key rather than equal keys. For each row in the ``left `` ``DataFrame ``,
1358
- we select the last row in the ``right `` ``DataFrame `` whose ``on `` key is less
1389
+ A :func: `merge_asof ` is similar to an ordered left-join except that we match on
1390
+ nearest key rather than equal keys. For each row in the ``left `` ``DataFrame ``,
1391
+ we select the last row in the ``right `` ``DataFrame `` whose ``on `` key is less
1359
1392
than the left's key. Both DataFrames must be sorted by the key.
1360
1393
1361
- Optionally an asof merge can perform a group-wise merge. This matches the
1394
+ Optionally an asof merge can perform a group-wise merge. This matches the
1362
1395
``by `` key equally, in addition to the nearest match on the ``on `` key.
1363
1396
1364
- For example; we might have ``trades `` and ``quotes `` and we want to ``asof ``
1397
+ For example; we might have ``trades `` and ``quotes `` and we want to ``asof ``
1365
1398
merge them.
1366
1399
1367
1400
.. ipython :: python
@@ -1420,8 +1453,8 @@ We only asof within ``2ms`` between the quote time and the trade time.
1420
1453
by = ' ticker' ,
1421
1454
tolerance = pd.Timedelta(' 2ms' ))
1422
1455
1423
- We only asof within ``10ms `` between the quote time and the trade time and we
1424
- exclude exact matches on time. Note that though we exclude the exact matches
1456
+ We only asof within ``10ms `` between the quote time and the trade time and we
1457
+ exclude exact matches on time. Note that though we exclude the exact matches
1425
1458
(of the quotes), prior quotes **do ** propagate to that point in time.
1426
1459
1427
1460
.. ipython :: python
0 commit comments