Skip to content

BUG: Fix index order for Index.intersection() #15583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 2 additions & 9 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -740,8 +740,8 @@ New Behavior:

.. _whatsnew_0200.api_breaking.index_order:

Index order after inner join due to Index intersection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Index.intersection and inner join now preserve the order of the left Index
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``Index.intersection`` now preserves the order of the calling Index (left)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say what this did previously. No need to mention Index.join or DataFrame.merge and say .align methods. simpler is better.

instead of the other Index (right) (:issue:`15582`). This affects the inner
Expand Down Expand Up @@ -788,18 +788,11 @@ joins (methods ``DataFrame.join`` and ``pd.merge``) and the .align methods.
1 10 100
2 20 200

In [5]: pd.merge(df1, df2, how='inner', left_index=True, right_index=True)
Out[5]:
a b
1 10 100
2 20 200

New Behavior:

.. ipython:: python

df1.join(df2, how='inner')
pd.merge(df1, df2, how='inner', left_index=True, right_index=True)


.. _whatsnew_0200.api:
Expand Down
9 changes: 6 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,9 +124,12 @@
----------%s
right : DataFrame
how : {'left', 'right', 'outer', 'inner'}, default 'inner'
* left: use only keys from left frame (SQL: left outer join)
* right: use only keys from right frame (SQL: right outer join)
* outer: use union of keys from both frames (SQL: full outer join)
* left: use only keys from left frame (SQL: left outer join), preserving
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you format these lines like

* right: use only keys from right frame
     similar to a SQL: right outer join; preserves key orderings

their order
* right: use only keys from right frame (SQL: right outer join), preserving
their order
* outer: use union of keys from both frames (SQL: full outer join), and
sort them lexicographically
* inner: use intersection of keys from both frames (SQL: inner join),
preserving the order of the left keys
on : label or list
Expand Down
2 changes: 2 additions & 0 deletions pandas/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -2844,6 +2844,8 @@ def _reindex_non_unique(self, target):
level : int or level name, default None
return_indexers : boolean, default False
sort : boolean, default False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded tag

Sort the join keys lexicographically in the result Index. If False,
the order of the join keys depends on the join type (how keyword)

.. versionadded:: 0.20.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a sentence explaining what sort does (I know its repeatative, but this is a different part of the code)


Expand Down
6 changes: 3 additions & 3 deletions pandas/tests/frame/test_join.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def df1():

@pytest.fixture
def df2():
return DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3])
return DataFrame({'b': [300, 100, 200]}, index=[3, 1, 2])


@pytest.mark.parametrize(
Expand All @@ -37,8 +37,8 @@ def df2():
('left', True, DataFrame({'a': [0, 10, 20],
'b': [np.nan, 100, 200]},
index=[0, 1, 2])),
('right', False, DataFrame({'a': [10, 20, np.nan],
'b': [100, 200, 300]},
('right', False, DataFrame({'a': [np.nan, 10, 20],
'b': [300, 100, 200]},
index=[1, 2, 3])),
('right', True, DataFrame({'a': [10, 20, np.nan],
'b': [100, 200, 300]},
Expand Down