Skip to content

Commit 57ea76f

Browse files
edublancasjreback
authored andcommitted
DOC: Improved documentation for DataFrame.join
closes #12188 Author: Eduardo Blancas Reyes <[email protected]> Closes #12193 from edublancas/master and squashes the following commits: a66f2ea [Eduardo Blancas Reyes] DOC: improves DataFrame.join documentation 8266cdc [Eduardo Blancas Reyes] DOC: improves DataFrame.join documentation
1 parent f2ce0ac commit 57ea76f

File tree

2 files changed

+84
-13
lines changed

2 files changed

+84
-13
lines changed

doc/source/merging.rst

+2-4
Original file line numberDiff line numberDiff line change
@@ -562,10 +562,8 @@ DataFrame instance method, with the calling DataFrame being implicitly
562562
considered the left object in the join.
563563

564564
The related ``DataFrame.join`` method, uses ``merge`` internally for the
565-
index-on-index and index-on-column(s) joins, but *joins on indexes* by default
566-
rather than trying to join on common columns (the default behavior for
567-
``merge``). If you are joining on index, you may wish to use ``DataFrame.join``
568-
to save yourself some typing.
565+
index-on-index (by default) and column(s)-on-index join. If you are joining on
566+
index only, you may wish to use ``DataFrame.join`` to save yourself some typing.
569567

570568
Brief primer on merge methods (relational algebra)
571569
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

pandas/core/frame.py

+82-9
Original file line numberDiff line numberDiff line change
@@ -4351,18 +4351,20 @@ def join(self, other, on=None, how='left', lsuffix='', rsuffix='',
43514351
Series is passed, its name attribute must be set, and that will be
43524352
used as the column name in the resulting joined DataFrame
43534353
on : column name, tuple/list of column names, or array-like
4354-
Column(s) to use for joining, otherwise join on index. If multiples
4354+
Column(s) in the caller to join on the index in other,
4355+
otherwise joins index-on-index. If multiples
43554356
columns given, the passed DataFrame must have a MultiIndex. Can
43564357
pass an array as the join key if not already contained in the
43574358
calling DataFrame. Like an Excel VLOOKUP operation
4358-
how : {'left', 'right', 'outer', 'inner'}
4359-
How to handle indexes of the two objects. Default: 'left'
4360-
for joining on index, None otherwise
4361-
4362-
* left: use calling frame's index
4363-
* right: use input frame's index
4364-
* outer: form union of indexes
4365-
* inner: use intersection of indexes
4359+
how : {'left', 'right', 'outer', 'inner'}, default: 'left'
4360+
How to handle the operation of the two objects.
4361+
4362+
* left: use calling frame's index (or column if on is specified)
4363+
* right: use other frame's index
4364+
* outer: form union of calling frame's index (or column if on is
4365+
specified) with other frame's index
4366+
* inner: form intersection of calling frame's index (or column if
4367+
on is specified) with other frame's index
43664368
lsuffix : string
43674369
Suffix to use from left frame's overlapping columns
43684370
rsuffix : string
@@ -4376,6 +4378,77 @@ def join(self, other, on=None, how='left', lsuffix='', rsuffix='',
43764378
on, lsuffix, and rsuffix options are not supported when passing a list
43774379
of DataFrame objects
43784380
4381+
Examples
4382+
--------
4383+
>>> caller = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
4384+
... 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
4385+
4386+
>>> caller
4387+
A key
4388+
0 A0 K0
4389+
1 A1 K1
4390+
2 A2 K2
4391+
3 A3 K3
4392+
4 A4 K4
4393+
5 A5 K5
4394+
4395+
>>> other = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
4396+
... 'B': ['B0', 'B1', 'B2']})
4397+
4398+
>>> other
4399+
B key
4400+
0 B0 K0
4401+
1 B1 K1
4402+
2 B2 K2
4403+
4404+
Join DataFrames using their indexes.
4405+
4406+
>>> caller.join(other, lsuffix='_caller', rsuffix='_other')
4407+
4408+
>>> A key_caller B key_other
4409+
0 A0 K0 B0 K0
4410+
1 A1 K1 B1 K1
4411+
2 A2 K2 B2 K2
4412+
3 A3 K3 NaN NaN
4413+
4 A4 K4 NaN NaN
4414+
5 A5 K5 NaN NaN
4415+
4416+
4417+
If we want to join using the key columns, we need to set key to be
4418+
the index in both caller and other. The joined DataFrame will have
4419+
key as its index.
4420+
4421+
>>> caller.set_index('key').join(other.set_index('key'))
4422+
4423+
>>> A B
4424+
key
4425+
K0 A0 B0
4426+
K1 A1 B1
4427+
K2 A2 B2
4428+
K3 A3 NaN
4429+
K4 A4 NaN
4430+
K5 A5 NaN
4431+
4432+
Another option to join using the key columns is to use the on
4433+
parameter. DataFrame.join always uses other's index but we can use any
4434+
column in the caller. This method preserves the original caller's
4435+
index in the result.
4436+
4437+
>>> caller.join(other.set_index('key'), on='key')
4438+
4439+
>>> A key B
4440+
0 A0 K0 B0
4441+
1 A1 K1 B1
4442+
2 A2 K2 B2
4443+
3 A3 K3 NaN
4444+
4 A4 K4 NaN
4445+
5 A5 K5 NaN
4446+
4447+
4448+
See also
4449+
--------
4450+
DataFrame.merge : For column(s)-on-columns(s) operations
4451+
43794452
Returns
43804453
-------
43814454
joined : DataFrame

0 commit comments

Comments
 (0)