Skip to content

ENH: Add sort parameter to RangeIndex.union (#24471) #25788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 26, 2019

Conversation

reidy-p
Copy link
Contributor

@reidy-p reidy-p commented Mar 19, 2019

This is WIP for adding a sort parameter to RangeIndex.union that behaves in a similar way to the other index types.

sort=None is the default to make it consistent with the union method in the base class. When sort=None a monotonically increasing RangeIndex will be returned if possible and a sorted Int64Index if not.

The way I have implemented sort=False is that it returns an Int64Index in all cases. I have been trying to think of cases where it would make sense to still return a RangeIndex when sort=False. For example, there might be a case where if we had two RangeIndexs and both had the same step and the second RangeIndex overlapped with the first we would want to return a RangeIndex here even if we had sort=False. But would it be better just to always return an Int64Index when sort=False as I have done here to make the return type consistent and because this particular case seems quite rare?

# sort=False returns an Int64Index even though we might be able to return a RangeIndex as below
In [1]: RangeIndex(0, 10, 2).union(RangeIndex(10, 12, 2), sort=False)
Out[1]: Int64Index([0, 2, 4, 6, 8, 10], dtype='int64')

In [1]: RangeIndex(0, 10, 2).union(RangeIndex(10, 12, 2), sort=None)
Out[1]: RangeIndex(start=0, stop=12, step=2)

@@ -2319,7 +2319,7 @@ def union(self, other, sort=None):
else:
rvals = other._values

if self.is_monotonic and other.is_monotonic:
if self.is_monotonic and other.is_monotonic and sort is None:
Copy link
Contributor Author

@reidy-p reidy-p Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to check more carefully whether adding this affects any of the other index types because at the moment it only causes a small number of PeriodIndex and DatetimeIndex tests to break (which I have already hopefully fixed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move the sort check to the beginning

@codecov
Copy link

codecov bot commented Mar 19, 2019

Codecov Report

Merging #25788 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25788      +/-   ##
==========================================
+ Coverage   91.26%   91.26%   +<.01%     
==========================================
  Files         172      172              
  Lines       52965    52965              
==========================================
+ Hits        48337    48338       +1     
+ Misses       4628     4627       -1
Flag Coverage Δ
#multiple 89.82% <100%> (ø) ⬆️
#single 41.74% <60%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.57% <100%> (ø) ⬆️
pandas/core/indexes/range.py 97.41% <100%> (ø) ⬆️
pandas/util/testing.py 89.44% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e54e55...fadd52e. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Mar 19, 2019

Codecov Report

Merging #25788 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25788      +/-   ##
==========================================
+ Coverage   91.26%   91.26%   +<.01%     
==========================================
  Files         172      172              
  Lines       52965    52965              
==========================================
+ Hits        48337    48338       +1     
+ Misses       4628     4627       -1
Flag Coverage Δ
#multiple 89.82% <100%> (ø) ⬆️
#single 41.74% <60%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.57% <100%> (ø) ⬆️
pandas/core/indexes/range.py 97.41% <100%> (ø) ⬆️
pandas/util/testing.py 89.44% <0%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e54e55...fadd52e. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 19, 2019

Codecov Report

Merging #25788 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #25788      +/-   ##
==========================================
- Coverage   91.48%   91.48%   -0.01%     
==========================================
  Files         175      175              
  Lines       52885    52885              
==========================================
- Hits        48381    48380       -1     
- Misses       4504     4505       +1
Flag Coverage Δ
#multiple 90.04% <100%> (ø) ⬆️
#single 41.82% <60%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.57% <100%> (ø) ⬆️
pandas/core/indexes/range.py 97.41% <100%> (ø) ⬆️
pandas/util/testing.py 89.74% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 923ac2b...c32ca8b. Read the comment docs.

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Mar 19, 2019
(RI(0), I64([1, 5, 6]), I64([1, 5, 6]))]
for idx1, idx2, expected in cases:

inputs = [(RI(0, 10, 1), RI(0, 10, 1)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make these fixtures so this is a bit simpler to understand (and maybe even pair the input / expected in a single tuple)

@pep8speaks
Copy link

pep8speaks commented Mar 20, 2019

Hello @reidy-p! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-03-26 16:02:11 UTC

@reidy-p reidy-p force-pushed the rangeindex_union_sort branch 2 times, most recently from 6413c1f to 1e11abb Compare March 20, 2019 21:30
res1 = idx1.union(idx2, sort=False)
tm.assert_index_equal(res1, expected, exact=True)
tm.assert_index_equal(res1, expected_notsorted, exact=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to make the tests a bit clearer by using a fixture for the input data as you suggested which is shared by the two tests. I could combine the two tests into one and then make a list of tuples of the form [(expected_sorted, expected_notsorted) ... ] but I found it a bit clearer to have two separate tests as above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not what i meant. make tuple of (input, expected_sorted, expected_not_sorted) as a single fixture. the fixtures is a list of these tuples.

@reidy-p reidy-p force-pushed the rangeindex_union_sort branch from 1e11abb to 8a80912 Compare March 20, 2019 21:33
@@ -2319,7 +2319,7 @@ def union(self, other, sort=None):
else:
rvals = other._values

if self.is_monotonic and other.is_monotonic:
if self.is_monotonic and other.is_monotonic and sort is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move the sort check to the beginning

res1 = idx1.union(idx2, sort=False)
tm.assert_index_equal(res1, expected, exact=True)
tm.assert_index_equal(res1, expected_notsorted, exact=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not what i meant. make tuple of (input, expected_sorted, expected_not_sorted) as a single fixture. the fixtures is a list of these tuples.

@reidy-p reidy-p force-pushed the rangeindex_union_sort branch 3 times, most recently from af5c322 to 1ee0cb5 Compare March 23, 2019 16:17
def test_union(self):
@pytest.fixture
def union_fixture(self):
"""Inputs and expected outputs for RangeIndex.union tests"""
RI = RangeIndex
I64 = Int64Index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the to write this is like this

@pytest.fixture(params=[
    (RI(0, 10, 1), RI(0, 10, 1), RI(0, 10, 1), RI(0, 10, 1)),
    .......,
   ids=list_of_strings_describing_the_cases)
def unions(request):
      return request.param


def test_union(unions):
     idx1, idx2, expected_sorted, expected_nonsorted = unions
     ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I understand what you mean now - thanks.

@reidy-p reidy-p force-pushed the rangeindex_union_sort branch from 1ee0cb5 to 5f5f4cb Compare March 24, 2019 17:09
tm.assert_index_equal(res2, expected, exact=True)
tm.assert_index_equal(res3, expected)
RI = RangeIndex
I64 = Int64Index
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I move this to the top of the file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes (and put a line-liner on what this is doing)

@reidy-p reidy-p force-pushed the rangeindex_union_sort branch 2 times, most recently from 72356c7 to 1a78ce5 Compare March 24, 2019 22:22
@@ -842,10 +870,6 @@ def test_len_specialised(self):

def test_append(self):
# GH16212
RI = RangeIndex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parmaterize this one as well (similar to above)

@reidy-p reidy-p force-pushed the rangeindex_union_sort branch from 1a78ce5 to e866537 Compare March 25, 2019 22:39
@reidy-p reidy-p force-pushed the rangeindex_union_sort branch from e866537 to c32ca8b Compare March 26, 2019 16:01
@jreback jreback added this to the 0.25.0 milestone Mar 26, 2019
@jreback jreback merged commit af6ccf6 into pandas-dev:master Mar 26, 2019
@jreback
Copy link
Contributor

jreback commented Mar 26, 2019

thanks @reidy-p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants