ENH: Add sort parameter to RangeIndex.union (#24471) #25788

reidy-p · 2019-03-19T21:47:31Z

progress towards ENH: Add sort parameter to other set operations if possible #24471
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This is WIP for adding a sort parameter to RangeIndex.union that behaves in a similar way to the other index types.

sort=None is the default to make it consistent with the union method in the base class. When sort=None a monotonically increasing RangeIndex will be returned if possible and a sorted Int64Index if not.

The way I have implemented sort=False is that it returns an Int64Index in all cases. I have been trying to think of cases where it would make sense to still return a RangeIndex when sort=False. For example, there might be a case where if we had two RangeIndexs and both had the same step and the second RangeIndex overlapped with the first we would want to return a RangeIndex here even if we had sort=False. But would it be better just to always return an Int64Index when sort=False as I have done here to make the return type consistent and because this particular case seems quite rare?

# sort=False returns an Int64Index even though we might be able to return a RangeIndex as below
In [1]: RangeIndex(0, 10, 2).union(RangeIndex(10, 12, 2), sort=False)
Out[1]: Int64Index([0, 2, 4, 6, 8, 10], dtype='int64')

In [1]: RangeIndex(0, 10, 2).union(RangeIndex(10, 12, 2), sort=None)
Out[1]: RangeIndex(start=0, stop=12, step=2)

reidy-p · 2019-03-19T21:49:55Z

pandas/core/indexes/base.py

@@ -2319,7 +2319,7 @@ def union(self, other, sort=None):
        else:
            rvals = other._values

-        if self.is_monotonic and other.is_monotonic:
+        if self.is_monotonic and other.is_monotonic and sort is None:


I will need to check more carefully whether adding this affects any of the other index types because at the moment it only causes a small number of PeriodIndex and DatetimeIndex tests to break (which I have already hopefully fixed).

can you move the sort check to the beginning

codecov · 2019-03-19T22:26:39Z

Codecov Report

Merging #25788 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #25788      +/-   ##
==========================================
+ Coverage   91.26%   91.26%   +<.01%     
==========================================
  Files         172      172              
  Lines       52965    52965              
==========================================
+ Hits        48337    48338       +1     
+ Misses       4628     4627       -1

Flag	Coverage Δ
#multiple	`89.82% <100%> (ø)`	⬆️
#single	`41.74% <60%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/base.py	`96.57% <100%> (ø)`	⬆️
pandas/core/indexes/range.py	`97.41% <100%> (ø)`	⬆️
pandas/util/testing.py	`89.44% <0%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e54e55...fadd52e. Read the comment docs.

codecov · 2019-03-19T22:26:47Z

Codecov Report

Merging #25788 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #25788      +/-   ##
==========================================
+ Coverage   91.26%   91.26%   +<.01%     
==========================================
  Files         172      172              
  Lines       52965    52965              
==========================================
+ Hits        48337    48338       +1     
+ Misses       4628     4627       -1

Flag	Coverage Δ
#multiple	`89.82% <100%> (ø)`	⬆️
#single	`41.74% <60%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/base.py	`96.57% <100%> (ø)`	⬆️
pandas/core/indexes/range.py	`97.41% <100%> (ø)`	⬆️
pandas/util/testing.py	`89.44% <0%> (+0.09%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e54e55...fadd52e. Read the comment docs.

codecov · 2019-03-19T22:27:05Z

Codecov Report

Merging #25788 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #25788      +/-   ##
==========================================
- Coverage   91.48%   91.48%   -0.01%     
==========================================
  Files         175      175              
  Lines       52885    52885              
==========================================
- Hits        48381    48380       -1     
- Misses       4504     4505       +1

Flag	Coverage Δ
#multiple	`90.04% <100%> (ø)`	⬆️
#single	`41.82% <60%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/base.py	`96.57% <100%> (ø)`	⬆️
pandas/core/indexes/range.py	`97.41% <100%> (ø)`	⬆️
pandas/util/testing.py	`89.74% <0%> (-0.11%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 923ac2b...c32ca8b. Read the comment docs.

jreback · 2019-03-19T23:16:50Z

pandas/tests/indexes/test_range.py

-                 (RI(0), I64([1, 5, 6]), I64([1, 5, 6]))]
-        for idx1, idx2, expected in cases:
+
+        inputs = [(RI(0, 10, 1), RI(0, 10, 1)),


can you make these fixtures so this is a bit simpler to understand (and maybe even pair the input / expected in a single tuple)

pep8speaks · 2019-03-20T21:28:17Z

Hello @reidy-p! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-03-26 16:02:11 UTC

reidy-p · 2019-03-20T21:32:40Z

pandas/tests/indexes/test_range.py

            res1 = idx1.union(idx2, sort=False)
-            tm.assert_index_equal(res1, expected, exact=True)
+            tm.assert_index_equal(res1, expected_notsorted, exact=True)


I have tried to make the tests a bit clearer by using a fixture for the input data as you suggested which is shared by the two tests. I could combine the two tests into one and then make a list of tuples of the form [(expected_sorted, expected_notsorted) ... ] but I found it a bit clearer to have two separate tests as above.

not what i meant. make tuple of (input, expected_sorted, expected_not_sorted) as a single fixture. the fixtures is a list of these tuples.

jreback · 2019-03-22T13:46:53Z

pandas/core/indexes/base.py

@@ -2319,7 +2319,7 @@ def union(self, other, sort=None):
        else:
            rvals = other._values

-        if self.is_monotonic and other.is_monotonic:
+        if self.is_monotonic and other.is_monotonic and sort is None:


can you move the sort check to the beginning

jreback · 2019-03-22T13:48:05Z

pandas/tests/indexes/test_range.py

            res1 = idx1.union(idx2, sort=False)
-            tm.assert_index_equal(res1, expected, exact=True)
+            tm.assert_index_equal(res1, expected_notsorted, exact=True)


not what i meant. make tuple of (input, expected_sorted, expected_not_sorted) as a single fixture. the fixtures is a list of these tuples.

jreback · 2019-03-23T20:20:24Z

pandas/tests/indexes/test_range.py

-    def test_union(self):
+    @pytest.fixture
+    def union_fixture(self):
+        """Inputs and expected outputs for RangeIndex.union tests"""
        RI = RangeIndex
        I64 = Int64Index


so the to write this is like this

@pytest.fixture(params=[ (RI(0, 10, 1), RI(0, 10, 1), RI(0, 10, 1), RI(0, 10, 1)), ......., ids=list_of_strings_describing_the_cases) def unions(request): return request.param def test_union(unions): idx1, idx2, expected_sorted, expected_nonsorted = unions ....

Ok I understand what you mean now - thanks.

pandas/tests/indexes/test_range.py

reidy-p · 2019-03-24T17:12:21Z

pandas/tests/indexes/test_range.py

-            tm.assert_index_equal(res2, expected, exact=True)
-            tm.assert_index_equal(res3, expected)
+    RI = RangeIndex
+    I64 = Int64Index


Should I move this to the top of the file?

yes (and put a line-liner on what this is doing)

jreback · 2019-03-24T23:36:26Z

pandas/tests/indexes/test_range.py

@@ -842,10 +870,6 @@ def test_len_specialised(self):

    def test_append(self):
        # GH16212
-        RI = RangeIndex


can you parmaterize this one as well (similar to above)

jreback · 2019-03-26T20:05:59Z

thanks @reidy-p

reidy-p commented Mar 19, 2019

View reviewed changes

reidy-p mentioned this pull request Mar 19, 2019

ENH: Add sort parameter to other set operations if possible #24471

Closed

6 tasks

jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Mar 19, 2019

jreback requested changes Mar 19, 2019

View reviewed changes

reidy-p force-pushed the rangeindex_union_sort branch 2 times, most recently from 6413c1f to 1e11abb Compare March 20, 2019 21:30

reidy-p commented Mar 20, 2019

View reviewed changes

reidy-p force-pushed the rangeindex_union_sort branch from 1e11abb to 8a80912 Compare March 20, 2019 21:33

jreback requested changes Mar 22, 2019

View reviewed changes

reidy-p force-pushed the rangeindex_union_sort branch 3 times, most recently from af5c322 to 1ee0cb5 Compare March 23, 2019 16:17

jreback requested changes Mar 23, 2019

View reviewed changes

reidy-p force-pushed the rangeindex_union_sort branch from 1ee0cb5 to 5f5f4cb Compare March 24, 2019 17:09

reidy-p commented Mar 24, 2019

View reviewed changes

pandas/tests/indexes/test_range.py Show resolved Hide resolved

reidy-p commented Mar 24, 2019

View reviewed changes

reidy-p force-pushed the rangeindex_union_sort branch 2 times, most recently from 72356c7 to 1a78ce5 Compare March 24, 2019 22:22

jreback reviewed Mar 24, 2019

View reviewed changes

reidy-p force-pushed the rangeindex_union_sort branch from 1a78ce5 to e866537 Compare March 25, 2019 22:39

reidy-p added 6 commits March 26, 2019 16:01

ENH: Add sort parameter to RangeIndex.union (pandas-dev#24471)

0b38695

Make tests a bit clearer using fixture

2dac97c

Fix fixture in test and move sort check

3d4c8d6

Update fixture

311b12c

Move aliases and add sort parameter to other union test

7cf5269

Add in missing whitespace

9b65818

Add fixture to append test

c32ca8b

reidy-p force-pushed the rangeindex_union_sort branch from e866537 to c32ca8b Compare March 26, 2019 16:01

jreback added this to the 0.25.0 milestone Mar 26, 2019

jreback approved these changes Mar 26, 2019

View reviewed changes

jreback merged commit af6ccf6 into pandas-dev:master Mar 26, 2019

galipremsagar mentioned this pull request Oct 4, 2021

PERF: Index.union is materializing even when it is not needed when sort=False #43885

Closed

3 tasks

Uh oh!

ENH: Add sort parameter to RangeIndex.union (#24471) #25788

ENH: Add sort parameter to RangeIndex.union (#24471) #25788

Uh oh!

Conversation

reidy-p commented Mar 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reidy-p Mar 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 19, 2019

Codecov Report

Uh oh!

codecov bot commented Mar 19, 2019

Codecov Report

Uh oh!

codecov bot commented Mar 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented Mar 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-03-26 16:02:11 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 26, 2019

Uh oh!

Uh oh!

reidy-p commented Mar 19, 2019 •

edited

Loading

reidy-p Mar 19, 2019 •

edited

Loading

codecov bot commented Mar 19, 2019 •

edited

Loading

pep8speaks commented Mar 20, 2019 •

edited

Loading