Skip to content

Commit 31eee47

Browse files
datapythonistaJustinZhengBC
authored andcommitted
DOC: Updating DataFrame.join docstring (pandas-dev#23471)
1 parent 43b135f commit 31eee47

File tree

1 file changed

+77
-79
lines changed

1 file changed

+77
-79
lines changed

pandas/core/frame.py

+77-79
Original file line numberDiff line numberDiff line change
@@ -6494,123 +6494,121 @@ def append(self, other, ignore_index=False,
64946494
def join(self, other, on=None, how='left', lsuffix='', rsuffix='',
64956495
sort=False):
64966496
"""
6497-
Join columns with other DataFrame either on index or on a key
6498-
column. Efficiently Join multiple DataFrame objects by index at once by
6497+
Join columns of another DataFrame.
6498+
6499+
Join columns with `other` DataFrame either on index or on a key
6500+
column. Efficiently join multiple DataFrame objects by index at once by
64996501
passing a list.
65006502
65016503
Parameters
65026504
----------
6503-
other : DataFrame, Series with name field set, or list of DataFrame
6505+
other : DataFrame, Series, or list of DataFrame
65046506
Index should be similar to one of the columns in this one. If a
65056507
Series is passed, its name attribute must be set, and that will be
6506-
used as the column name in the resulting joined DataFrame
6507-
on : name, tuple/list of names, or array-like
6508+
used as the column name in the resulting joined DataFrame.
6509+
on : str, list of str, or array-like, optional
65086510
Column or index level name(s) in the caller to join on the index
65096511
in `other`, otherwise joins index-on-index. If multiple
65106512
values given, the `other` DataFrame must have a MultiIndex. Can
65116513
pass an array as the join key if it is not already contained in
6512-
the calling DataFrame. Like an Excel VLOOKUP operation
6513-
how : {'left', 'right', 'outer', 'inner'}, default: 'left'
6514+
the calling DataFrame. Like an Excel VLOOKUP operation.
6515+
how : {'left', 'right', 'outer', 'inner'}, default 'left'
65146516
How to handle the operation of the two objects.
65156517
65166518
* left: use calling frame's index (or column if on is specified)
6517-
* right: use other frame's index
6519+
* right: use `other`'s index.
65186520
* outer: form union of calling frame's index (or column if on is
6519-
specified) with other frame's index, and sort it
6520-
lexicographically
6521+
specified) with `other`'s index, and sort it.
6522+
lexicographically.
65216523
* inner: form intersection of calling frame's index (or column if
6522-
on is specified) with other frame's index, preserving the order
6523-
of the calling's one
6524-
lsuffix : string
6525-
Suffix to use from left frame's overlapping columns
6526-
rsuffix : string
6527-
Suffix to use from right frame's overlapping columns
6528-
sort : boolean, default False
6524+
on is specified) with `other`'s index, preserving the order
6525+
of the calling's one.
6526+
lsuffix : str, default ''
6527+
Suffix to use from left frame's overlapping columns.
6528+
rsuffix : str, default ''
6529+
Suffix to use from right frame's overlapping columns.
6530+
sort : bool, default False
65296531
Order result DataFrame lexicographically by the join key. If False,
6530-
the order of the join key depends on the join type (how keyword)
6532+
the order of the join key depends on the join type (how keyword).
6533+
6534+
Returns
6535+
-------
6536+
DataFrame
6537+
A dataframe containing columns from both the caller and `other`.
65316538
65326539
Notes
65336540
-----
6534-
on, lsuffix, and rsuffix options are not supported when passing a list
6535-
of DataFrame objects
6541+
Parameters `on`, `lsuffix`, and `rsuffix` are not supported when
6542+
passing a list of `DataFrame` objects.
65366543
65376544
Support for specifying index levels as the `on` parameter was added
6538-
in version 0.23.0
6545+
in version 0.23.0.
6546+
6547+
See Also
6548+
--------
6549+
DataFrame.merge : For column(s)-on-columns(s) operations.
65396550
65406551
Examples
65416552
--------
6542-
>>> caller = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
6543-
... 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
6544-
6545-
>>> caller
6546-
A key
6547-
0 A0 K0
6548-
1 A1 K1
6549-
2 A2 K2
6550-
3 A3 K3
6551-
4 A4 K4
6552-
5 A5 K5
6553+
>>> df = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
6554+
... 'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
6555+
6556+
>>> df
6557+
key A
6558+
0 K0 A0
6559+
1 K1 A1
6560+
2 K2 A2
6561+
3 K3 A3
6562+
4 K4 A4
6563+
5 K5 A5
65536564
65546565
>>> other = pd.DataFrame({'key': ['K0', 'K1', 'K2'],
65556566
... 'B': ['B0', 'B1', 'B2']})
65566567
65576568
>>> other
6558-
B key
6559-
0 B0 K0
6560-
1 B1 K1
6561-
2 B2 K2
6569+
key B
6570+
0 K0 B0
6571+
1 K1 B1
6572+
2 K2 B2
65626573
65636574
Join DataFrames using their indexes.
65646575
6565-
>>> caller.join(other, lsuffix='_caller', rsuffix='_other')
6566-
6567-
>>> A key_caller B key_other
6568-
0 A0 K0 B0 K0
6569-
1 A1 K1 B1 K1
6570-
2 A2 K2 B2 K2
6571-
3 A3 K3 NaN NaN
6572-
4 A4 K4 NaN NaN
6573-
5 A5 K5 NaN NaN
6574-
6576+
>>> df.join(other, lsuffix='_caller', rsuffix='_other')
6577+
key_caller A key_other B
6578+
0 K0 A0 K0 B0
6579+
1 K1 A1 K1 B1
6580+
2 K2 A2 K2 B2
6581+
3 K3 A3 NaN NaN
6582+
4 K4 A4 NaN NaN
6583+
5 K5 A5 NaN NaN
65756584
65766585
If we want to join using the key columns, we need to set key to be
6577-
the index in both caller and other. The joined DataFrame will have
6586+
the index in both `df` and `other`. The joined DataFrame will have
65786587
key as its index.
65796588
6580-
>>> caller.set_index('key').join(other.set_index('key'))
6581-
6582-
>>> A B
6583-
key
6584-
K0 A0 B0
6585-
K1 A1 B1
6586-
K2 A2 B2
6587-
K3 A3 NaN
6588-
K4 A4 NaN
6589-
K5 A5 NaN
6590-
6591-
Another option to join using the key columns is to use the on
6592-
parameter. DataFrame.join always uses other's index but we can use any
6593-
column in the caller. This method preserves the original caller's
6589+
>>> df.set_index('key').join(other.set_index('key'))
6590+
A B
6591+
key
6592+
K0 A0 B0
6593+
K1 A1 B1
6594+
K2 A2 B2
6595+
K3 A3 NaN
6596+
K4 A4 NaN
6597+
K5 A5 NaN
6598+
6599+
Another option to join using the key columns is to use the `on`
6600+
parameter. DataFrame.join always uses `other`'s index but we can use
6601+
any column in `df`. This method preserves the original DataFrame's
65946602
index in the result.
65956603
6596-
>>> caller.join(other.set_index('key'), on='key')
6597-
6598-
>>> A key B
6599-
0 A0 K0 B0
6600-
1 A1 K1 B1
6601-
2 A2 K2 B2
6602-
3 A3 K3 NaN
6603-
4 A4 K4 NaN
6604-
5 A5 K5 NaN
6605-
6606-
6607-
See also
6608-
--------
6609-
DataFrame.merge : For column(s)-on-columns(s) operations
6610-
6611-
Returns
6612-
-------
6613-
joined : DataFrame
6604+
>>> df.join(other.set_index('key'), on='key')
6605+
key A B
6606+
0 K0 A0 B0
6607+
1 K1 A1 B1
6608+
2 K2 A2 B2
6609+
3 K3 A3 NaN
6610+
4 K4 A4 NaN
6611+
5 K5 A5 NaN
66146612
"""
66156613
# For SparseDataFrame's benefit
66166614
return self._join_compat(other, on=on, how=how, lsuffix=lsuffix,

0 commit comments

Comments
 (0)