Skip to content

Commit 33eb740

Browse files
Albert Villanova del Moraljreback
Albert Villanova del Moral
authored andcommitted
Address requested changes
1 parent 3c200fe commit 33eb740

File tree

5 files changed

+154
-66
lines changed

5 files changed

+154
-66
lines changed

doc/source/whatsnew/v0.20.0.txt

+116-31
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,7 @@ Other enhancements
309309
- ``pd.types.concat.union_categoricals`` gained the ``ignore_ordered`` argument to allow ignoring the ordered attribute of unioned categoricals (:issue:`13410`). See the :ref:`categorical union docs <categorical.union>` for more information.
310310
- ``pandas.io.json.json_normalize()`` with an empty ``list`` will return an empty ``DataFrame`` (:issue:`15534`)
311311
- ``pd.DataFrame.to_latex`` and ``pd.DataFrame.to_string`` now allow optional header aliases. (:issue:`15536`)
312+
- ``Index.intersection()`` accepts parameter ``sort`` (:issue:`15582`)
312313

313314
.. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601#Durations
314315

@@ -740,52 +741,135 @@ New Behavior:
740741

741742
.. _whatsnew_0200.api_breaking.index_order:
742743

743-
Index order after DataFrame inner join or Index intersection
744-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
744+
Index order after inner join due to Index intersection
745+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
745746

746-
The ``DataFrame`` inner join and the ``Index`` intersection, now preserve the
747-
order of the calling's Index (left) instead of the other's Index (right)
748-
(:issue:`15582`)
747+
The ``Index.intersection`` now preserves the order of the calling Index (left)
748+
instead of the other Index (right) (:issue:`15582`). This affects the inner
749+
joins (methods ``Index.join``, ``DataFrame.join``, ``DataFrame.merge`` and
750+
``pd.merge``) and the alignments with inner join (methods ``Series.align`` and
751+
``DataFrame.align``).
749752

750-
Previous Behavior:
753+
- ``Index.intersection`` and ``Index.join``
751754

752-
.. code-block:: ipython
753-
In [2]: df1 = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0])
755+
.. ipython:: python
754756

755-
In [3]: df2 = pd.DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3])
757+
idx1 = pd.Index([2, 1, 0])
758+
idx1
759+
idx2 = pd.Index([1, 2, 3])
760+
idx2
756761

757-
In [4]: df1.join(df2, how='inner')
758-
Out[4]:
759-
a b
760-
1 10 100
761-
2 20 200
762+
Previous Behavior:
762763

763-
In [5]: idx1 = pd.Index([5, 3, 2, 4, 1])
764+
.. code-block:: ipython
764765

765-
In [6]: idx2 = pd.Index([4, 7, 6, 5, 3])
766+
In [4]: idx1.intersection(idx2)
767+
Out[4]: Int64Index([1, 2], dtype='int64')
766768

767-
In [7]: idx1.intersection(idx2)
768-
Out[7]: Int64Index([4, 5, 3], dtype='int64')
769+
In [5]: idx1.join(idx2, how='inner')
770+
Out[5]: Int64Index([1, 2], dtype='int64')
769771

770-
New Behavior:
772+
New Behavior:
771773

772-
.. code-block:: ipython
773-
In [2]: df1 = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0])
774+
.. ipython:: python
775+
776+
idx1.intersection(idx2)
777+
778+
idx1.join(idx2, how='inner')
779+
780+
- ``Series.align``
781+
782+
.. ipython:: python
783+
784+
s1 = pd.Series([20, 10, 0], index=[2, 1, 0])
785+
s1
786+
s2 = pd.Series([100, 200, 300], index=[1, 2, 3])
787+
s2
788+
789+
Previous Behavior:
790+
791+
.. code-block:: ipython
792+
793+
In [4]: (res1, res2) = s1.align(s2, join='inner')
794+
795+
In [5]: res1
796+
Out[5]:
797+
1 10
798+
2 20
799+
dtype: int64
800+
801+
In [6]: res2
802+
Out[6]:
803+
1 100
804+
2 200
805+
dtype: int64
806+
807+
New Behavior:
808+
809+
.. ipython:: python
810+
811+
(res1, res2) = s1.align(s2, join='inner')
812+
res1
813+
res2
814+
815+
- ``DataFrame.join``, ``DataFrame.merge`` and ``pd.merge``
816+
817+
.. ipython:: python
818+
819+
df1 = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0])
820+
df1
821+
df2 = pd.DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3])
822+
df2
823+
824+
Previous Behavior:
825+
826+
.. code-block:: ipython
827+
828+
In [4]: df1.join(df2, how='inner')
829+
Out[4]:
830+
a b
831+
1 10 100
832+
2 20 200
833+
834+
In [5]: df1.merge(df2, how='inner', left_index=True, right_index=True)
835+
Out[5]:
836+
a b
837+
1 10 100
838+
2 20 200
839+
840+
In [6]: pd.merge(df1, df2, how='inner', left_index=True, right_index=True)
841+
Out[6]:
842+
a b
843+
1 10 100
844+
2 20 200
845+
846+
In [7]: (res1, res2) = df1.align(df2, axis=0, join='inner')
847+
848+
In [8]: res1
849+
Out[8]:
850+
a
851+
1 10
852+
2 20
853+
854+
In [9]: res2
855+
Out[9]:
856+
b
857+
1 100
858+
2 200
859+
860+
New Behavior:
774861

775-
In [3]: df2 = pd.DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3])
862+
.. ipython:: python
776863

777-
In [4]: df1.join(df2, how='inner')
778-
Out[4]:
779-
a b
780-
2 20 200
781-
1 10 100
864+
df1.join(df2, how='inner')
782865

783-
In [5]: idx1 = pd.Index([5, 3, 2, 4, 1])
866+
df1.merge(df2, how='inner', left_index=True, right_index=True)
784867

785-
In [6]: idx2 = pd.Index([4, 7, 6, 5, 3])
868+
pd.merge(df1, df2, how='inner', left_index=True, right_index=True)
786869

787-
In [7]: idx1.intersection(idx2)
788-
Out[7]: Int64Index([5, 3, 4], dtype='int64')
870+
(res1, res2) = df1.align(df2, axis=0, join='inner')
871+
res1
872+
res2
789873

790874

791875
.. _whatsnew_0200.api:
@@ -1024,3 +1108,4 @@ Bug Fixes
10241108
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
10251109
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
10261110
- Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)
1111+
- Bug with ``sort=True`` in ``DataFrame.join``, ``DataFrame.merge`` and ``pd.merge`` when joining on index (:issue:`15582`)

pandas/core/frame.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,8 @@
127127
* left: use only keys from left frame (SQL: left outer join)
128128
* right: use only keys from right frame (SQL: right outer join)
129129
* outer: use union of keys from both frames (SQL: full outer join)
130-
* inner: use intersection of keys from both frames (SQL: inner join)
130+
* inner: use intersection of keys from both frames (SQL: inner join),
131+
preserving the order of the left keys
131132
on : label or list
132133
Field names to join on. Must be found in both DataFrames. If on is
133134
None and not merging on indexes, then it merges on the intersection of
@@ -147,7 +148,8 @@
147148
Use the index from the right DataFrame as the join key. Same caveats as
148149
left_index
149150
sort : boolean, default False
150-
Sort the join keys lexicographically in the result DataFrame
151+
Sort the join keys lexicographically in the result DataFrame. If False,
152+
the order of the join keys depends on the join type (how keyword)
151153
suffixes : 2-length sequence (tuple, list, ...)
152154
Suffix to apply to overlapping column names in the left and right
153155
side, respectively
@@ -4464,6 +4466,7 @@ def join(self, other, on=None, how='left', lsuffix='', rsuffix='',
44644466
* right: use other frame's index
44654467
* outer: form union of calling frame's index (or column if on is
44664468
specified) with other frame's index, and sort it
4469+
lexicographically
44674470
* inner: form intersection of calling frame's index (or column if
44684471
on is specified) with other frame's index, preserving the
44694472
order of the calling's one

pandas/indexes/base.py

+8-4
Original file line numberDiff line numberDiff line change
@@ -2090,7 +2090,7 @@ def intersection(self, other):
20902090
Form the intersection of two Index objects.
20912091
20922092
This returns a new Index with elements common to the index and `other`,
2093-
preserving the calling index order.
2093+
preserving the order of the calling index.
20942094
20952095
Parameters
20962096
----------
@@ -2831,9 +2831,7 @@ def _reindex_non_unique(self, target):
28312831
new_index = self._shallow_copy_with_infer(new_labels, freq=None)
28322832
return new_index, indexer, new_indexer
28332833

2834-
def join(self, other, how='left', level=None, return_indexers=False,
2835-
sort=False):
2836-
"""
2834+
_index_shared_docs['join'] = """
28372835
*this is an internal non-public method*
28382836
28392837
Compute join_index and indexers to conform data
@@ -2847,10 +2845,16 @@ def join(self, other, how='left', level=None, return_indexers=False,
28472845
return_indexers : boolean, default False
28482846
sort : boolean, default False
28492847
2848+
.. versionadded:: 0.20.0
2849+
28502850
Returns
28512851
-------
28522852
join_index, (left_indexer, right_indexer)
28532853
"""
2854+
2855+
@Appender(_index_shared_docs['join'])
2856+
def join(self, other, how='left', level=None, return_indexers=False,
2857+
sort=False):
28542858
from .multi import MultiIndex
28552859
self_is_mi = isinstance(self, MultiIndex)
28562860
other_is_mi = isinstance(other, MultiIndex)

pandas/indexes/range.py

+1-18
Original file line numberDiff line numberDiff line change
@@ -431,26 +431,9 @@ def union(self, other):
431431

432432
return self._int64index.union(other)
433433

434+
@Appender(_index_shared_docs['join'])
434435
def join(self, other, how='left', level=None, return_indexers=False,
435436
sort=False):
436-
"""
437-
*this is an internal non-public method*
438-
439-
Compute join_index and indexers to conform data
440-
structures to the new index.
441-
442-
Parameters
443-
----------
444-
other : Index
445-
how : {'left', 'right', 'inner', 'outer'}
446-
level : int or level name, default None
447-
return_indexers : boolean, default False
448-
sort : boolean, default False
449-
450-
Returns
451-
-------
452-
join_index, (left_indexer, right_indexer)
453-
"""
454437
if how == 'outer' and self is not other:
455438
# note: could return RangeIndex in more circumstances
456439
return self._int64index.join(other, how, level, return_indexers,

pandas/tests/frame/test_join.py

+24-11
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
from __future__ import print_function
44

5+
import numpy as np
6+
57
import pandas as pd
68

79
from pandas.tests.frame.common import TestData
@@ -15,64 +17,75 @@ def test_join(self):
1517
df1 = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0])
1618
df2 = pd.DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3])
1719

20+
# default how='left'
1821
result = df1.join(df2)
19-
expected = pd.DataFrame({'a': [20, 10, 0], 'b': [200, 100, None]},
22+
expected = pd.DataFrame({'a': [20, 10, 0], 'b': [200, 100, np.nan]},
2023
index=[2, 1, 0])
2124
tm.assert_frame_equal(result, expected)
2225

26+
# how='left'
2327
result = df1.join(df2, how='left')
24-
expected = pd.DataFrame({'a': [20, 10, 0], 'b': [200, 100, None]},
28+
expected = pd.DataFrame({'a': [20, 10, 0], 'b': [200, 100, np.nan]},
2529
index=[2, 1, 0])
2630
tm.assert_frame_equal(result, expected)
2731

32+
# how='right'
2833
result = df1.join(df2, how='right')
29-
expected = pd.DataFrame({'a': [10, 20, None], 'b': [100, 200, 300]},
34+
expected = pd.DataFrame({'a': [10, 20, np.nan], 'b': [100, 200, 300]},
3035
index=[1, 2, 3])
3136
tm.assert_frame_equal(result, expected)
3237

38+
# how='inner'
3339
result = df1.join(df2, how='inner')
3440
expected = pd.DataFrame({'a': [20, 10], 'b': [200, 100]},
3541
index=[2, 1])
3642
tm.assert_frame_equal(result, expected)
3743

44+
# how='outer'
3845
result = df1.join(df2, how='outer')
39-
expected = pd.DataFrame({'a': [0, 10, 20, None],
40-
'b': [None, 100, 200, 300]},
46+
expected = pd.DataFrame({'a': [0, 10, 20, np.nan],
47+
'b': [np.nan, 100, 200, 300]},
4148
index=[0, 1, 2, 3])
4249
tm.assert_frame_equal(result, expected)
4350

4451
def test_join_sort(self):
4552
df1 = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0])
4653
df2 = pd.DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3])
4754

55+
# default how='left'
4856
result = df1.join(df2, sort=True)
49-
expected = pd.DataFrame({'a': [0, 10, 20], 'b': [None, 100, 200]},
57+
expected = pd.DataFrame({'a': [0, 10, 20], 'b': [np.nan, 100, 200]},
5058
index=[0, 1, 2])
5159
tm.assert_frame_equal(result, expected)
5260

61+
# how='left'
5362
result = df1.join(df2, how='left', sort=True)
54-
expected = pd.DataFrame({'a': [0, 10, 20], 'b': [None, 100, 200]},
63+
expected = pd.DataFrame({'a': [0, 10, 20], 'b': [np.nan, 100, 200]},
5564
index=[0, 1, 2])
5665
tm.assert_frame_equal(result, expected)
5766

67+
# how='right' (already sorted)
5868
result = df1.join(df2, how='right', sort=True)
59-
expected = pd.DataFrame({'a': [10, 20, None], 'b': [100, 200, 300]},
69+
expected = pd.DataFrame({'a': [10, 20, np.nan], 'b': [100, 200, 300]},
6070
index=[1, 2, 3])
6171
tm.assert_frame_equal(result, expected)
6272

73+
# how='right'
6374
result = df2.join(df1, how='right', sort=True)
64-
expected = pd.DataFrame([[None, 0], [100, 10], [200, 20]],
75+
expected = pd.DataFrame([[np.nan, 0], [100, 10], [200, 20]],
6576
columns=['b', 'a'], index=[0, 1, 2])
6677
tm.assert_frame_equal(result, expected)
6778

79+
# how='inner'
6880
result = df1.join(df2, how='inner', sort=True)
6981
expected = pd.DataFrame({'a': [10, 20], 'b': [100, 200]},
7082
index=[1, 2])
7183
tm.assert_frame_equal(result, expected)
7284

85+
# how='outer'
7386
result = df1.join(df2, how='outer', sort=True)
74-
expected = pd.DataFrame({'a': [0, 10, 20, None],
75-
'b': [None, 100, 200, 300]},
87+
expected = pd.DataFrame({'a': [0, 10, 20, np.nan],
88+
'b': [np.nan, 100, 200, 300]},
7689
index=[0, 1, 2, 3])
7790
tm.assert_frame_equal(result, expected)
7891

0 commit comments

Comments
 (0)