-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add sort parameter to RangeIndex.union (#24471) #25788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexes/base.py
Outdated
@@ -2319,7 +2319,7 @@ def union(self, other, sort=None): | |||
else: | |||
rvals = other._values | |||
|
|||
if self.is_monotonic and other.is_monotonic: | |||
if self.is_monotonic and other.is_monotonic and sort is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to check more carefully whether adding this affects any of the other index types because at the moment it only causes a small number of PeriodIndex
and DatetimeIndex
tests to break (which I have already hopefully fixed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move the sort check to the beginning
Codecov Report
@@ Coverage Diff @@
## master #25788 +/- ##
==========================================
+ Coverage 91.26% 91.26% +<.01%
==========================================
Files 172 172
Lines 52965 52965
==========================================
+ Hits 48337 48338 +1
+ Misses 4628 4627 -1
Continue to review full report at Codecov.
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #25788 +/- ##
==========================================
+ Coverage 91.26% 91.26% +<.01%
==========================================
Files 172 172
Lines 52965 52965
==========================================
+ Hits 48337 48338 +1
+ Misses 4628 4627 -1
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25788 +/- ##
==========================================
- Coverage 91.48% 91.48% -0.01%
==========================================
Files 175 175
Lines 52885 52885
==========================================
- Hits 48381 48380 -1
- Misses 4504 4505 +1
Continue to review full report at Codecov.
|
pandas/tests/indexes/test_range.py
Outdated
(RI(0), I64([1, 5, 6]), I64([1, 5, 6]))] | ||
for idx1, idx2, expected in cases: | ||
|
||
inputs = [(RI(0, 10, 1), RI(0, 10, 1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make these fixtures so this is a bit simpler to understand (and maybe even pair the input / expected in a single tuple)
6413c1f
to
1e11abb
Compare
pandas/tests/indexes/test_range.py
Outdated
res1 = idx1.union(idx2, sort=False) | ||
tm.assert_index_equal(res1, expected, exact=True) | ||
tm.assert_index_equal(res1, expected_notsorted, exact=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to make the tests a bit clearer by using a fixture for the input data as you suggested which is shared by the two tests. I could combine the two tests into one and then make a list of tuples of the form [(expected_sorted, expected_notsorted) ... ]
but I found it a bit clearer to have two separate tests as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not what i meant. make tuple of (input, expected_sorted, expected_not_sorted) as a single fixture. the fixtures is a list of these tuples.
1e11abb
to
8a80912
Compare
pandas/core/indexes/base.py
Outdated
@@ -2319,7 +2319,7 @@ def union(self, other, sort=None): | |||
else: | |||
rvals = other._values | |||
|
|||
if self.is_monotonic and other.is_monotonic: | |||
if self.is_monotonic and other.is_monotonic and sort is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move the sort check to the beginning
pandas/tests/indexes/test_range.py
Outdated
res1 = idx1.union(idx2, sort=False) | ||
tm.assert_index_equal(res1, expected, exact=True) | ||
tm.assert_index_equal(res1, expected_notsorted, exact=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not what i meant. make tuple of (input, expected_sorted, expected_not_sorted) as a single fixture. the fixtures is a list of these tuples.
af5c322
to
1ee0cb5
Compare
pandas/tests/indexes/test_range.py
Outdated
def test_union(self): | ||
@pytest.fixture | ||
def union_fixture(self): | ||
"""Inputs and expected outputs for RangeIndex.union tests""" | ||
RI = RangeIndex | ||
I64 = Int64Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the to write this is like this
@pytest.fixture(params=[
(RI(0, 10, 1), RI(0, 10, 1), RI(0, 10, 1), RI(0, 10, 1)),
.......,
ids=list_of_strings_describing_the_cases)
def unions(request):
return request.param
def test_union(unions):
idx1, idx2, expected_sorted, expected_nonsorted = unions
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I understand what you mean now - thanks.
1ee0cb5
to
5f5f4cb
Compare
pandas/tests/indexes/test_range.py
Outdated
tm.assert_index_equal(res2, expected, exact=True) | ||
tm.assert_index_equal(res3, expected) | ||
RI = RangeIndex | ||
I64 = Int64Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I move this to the top of the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes (and put a line-liner on what this is doing)
72356c7
to
1a78ce5
Compare
@@ -842,10 +870,6 @@ def test_len_specialised(self): | |||
|
|||
def test_append(self): | |||
# GH16212 | |||
RI = RangeIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parmaterize this one as well (similar to above)
1a78ce5
to
e866537
Compare
e866537
to
c32ca8b
Compare
thanks @reidy-p |
git diff upstream/master -u -- "*.py" | flake8 --diff
This is WIP for adding a
sort
parameter toRangeIndex.union
that behaves in a similar way to the other index types.sort=None
is the default to make it consistent with the union method in the base class. Whensort=None
a monotonically increasingRangeIndex
will be returned if possible and a sortedInt64Index
if not.The way I have implemented
sort=False
is that it returns anInt64Index
in all cases. I have been trying to think of cases where it would make sense to still return aRangeIndex
whensort=False
. For example, there might be a case where if we had twoRangeIndex
s and both had the same step and the secondRangeIndex
overlapped with the first we would want to return aRangeIndex
here even if we hadsort=False
. But would it be better just to always return anInt64Index
whensort=False
as I have done here to make the return type consistent and because this particular case seems quite rare?