Skip to content

BUG: Index._searchsorted_monotonic(..., side='right') returns the left side position for monotonic decreasing indexes #17272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 7, 2017

Conversation

jschendel
Copy link
Member

I didn't add a whatsnew entry for the Index._searchsorted_monotonic fix, since it looks like past convention has been to not add whatsnew entries for private methods. Happy to add a whatsnew entry if I'm mistaken though. I did add a whatsnew entry for the downstream effect on IntervalIndex.

@gfyoung
Copy link
Member

gfyoung commented Aug 17, 2017

since it looks like past convention has been to not add whatsnew entries for private methods.

That is correct.

@gfyoung gfyoung added the Indexing Related to indexing on series/frames, not to indexes themselves label Aug 17, 2017
def test_searchsorted_monotonic(self):
# GH17271
idx_inc = Index([0, 2, 4])
idx_dec = Index([4, 2, 0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be better to move these to common.py/Base as they be tested generically.

@@ -39,6 +42,9 @@ def test_loc_with_scalar(self):
expected = s.iloc[2:5]
tm.assert_series_equal(expected, s.loc[s >= 2])

# test endpoint of non-overlapping monotonic decreasing (GH16417)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would instead make a specific test of this (also test increasing monotonic), even better would be to parametrize over closed (bonus points)

@jschendel jschendel force-pushed the fix-idx-searchsortedmonotonic branch from 6378a27 to 436852e Compare August 21, 2017 01:47
@codecov
Copy link

codecov bot commented Aug 21, 2017

Codecov Report

Merging #17272 into master will decrease coverage by 50.77%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #17272       +/-   ##
===========================================
- Coverage   91.03%   40.25%   -50.78%     
===========================================
  Files         162      162               
  Lines       49567    49540       -27     
===========================================
- Hits        45123    19944    -25179     
- Misses       4444    29596    +25152
Flag Coverage Δ
#multiple ?
#single 40.25% <ø> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 56.35% <ø> (-39.59%) ⬇️
pandas/tools/hashing.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/lib.py 0% <0%> (-100%) ⬇️
pandas/parser.py 0% <0%> (-100%) ⬇️
pandas/tslib.py 0% <0%> (-100%) ⬇️
pandas/io/json/json.py 0% <0%> (-100%) ⬇️
pandas/io/formats/style.py 0% <0%> (-100%) ⬇️
pandas/types/concat.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-90.77%) ⬇️
... and 112 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58d8729...436852e. Read the comment docs.

@codecov
Copy link

codecov bot commented Aug 21, 2017

Codecov Report

Merging #17272 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17272      +/-   ##
==========================================
- Coverage   91.15%   91.14%   -0.02%     
==========================================
  Files         163      163              
  Lines       49591    49591              
==========================================
- Hits        45207    45201       -6     
- Misses       4384     4390       +6
Flag Coverage Δ
#multiple 88.93% <ø> (ø) ⬆️
#single 40.25% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.29% <ø> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.72% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.52% <0%> (+0.09%) ⬆️
pandas/core/indexes/datetimelike.py 96.88% <0%> (+0.22%) ⬆️
pandas/core/indexes/range.py 92.59% <0%> (+0.3%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20fee85...41e59d7. Read the comment docs.

@jschendel
Copy link
Member Author

Notes regarding the review changes I made:

  • Removed some redundant checks from test_loc_with_scalar and test_getitem_with_scalar in test_interval.py, as these checks are now being handled by test_nonoverlapping_monotonic (the index used in the two former tests is nonoverlapping monotonic increasing).

  • Added some monotonic decreasing indexes to setup_method to various test classes within in order to ensure that monotonic decreasing indexes are fully tested by common.py/test_searchsorted_monotonic (and other tests).

    • Had to change tm.assert_numpy_array_equal to tm.equalContents in common.py/test_difference_base. Introducing a decreasing sequence led to the test failing with tm.assert_numpy_array_equal due to different sorting of the result and expected answer. If keeping tm.assert_numpy_array_equal is preferable, the arrays could be sorted prior to using it.
    • Did not add a monotonic decreasing index to test_range, as it currently fails common.py/test_intersection_base due to the issue described in BUG: Intersection of RangeIndexes fails when the step sign is negative #17296 (will add when I submit a PR for this issue).

@jreback
Copy link
Contributor

jreback commented Aug 21, 2017

Had to change tm.assert_numpy_array_equal to tm.equalContents in common.py/test_difference_base. Introducing a decreasing sequence led to the test failing with tm.assert_numpy_array_equal due to different sorting of the result and expected answer. If keeping tm.assert_numpy_array_equal is preferable, the arrays could be sorted prior to using it.

yes, pls just sort them if need; we don't use tm.equalContents (actually should remove that)

Did not add a monotonic decreasing index to test_range, as it currently fails common.py/test_intersection_base due to the issue described in #17296 (will add when I submit a PR for this issue).

simply add an xfailing test (with the issue number).

@jschendel
Copy link
Member Author

jschendel commented Aug 21, 2017

Okay, made those additional changes. Any idea why codecov is saying this will decrease coverage by 50%? I don't think I made any substantial changes that would cause that. Just a glitch?

EDIT: Nevermind, looks like codecov is fixed.

@jschendel jschendel force-pushed the fix-idx-searchsortedmonotonic branch from b4ea216 to 1002aaa Compare August 23, 2017 02:16
@jschendel
Copy link
Member Author

ping @jreback : made review related updates and did a rebase to resolve merge conflicts

@jreback jreback added this to the 0.21.0 milestone Aug 23, 2017
@@ -354,6 +354,7 @@ Indexing
- Avoids ``IndexError`` when passing an Index or Series to ``.iloc`` with older numpy (:issue:`17193`)
- Allow unicode empty strings as placeholders in multilevel columns in Python 2 (:issue:`17099`)
- Bug in ``.iloc`` when used with inplace addition or assignment and an int indexer on a ``MultiIndex`` causing the wrong indexes to be read from and written to (:issue:`17148`)
- Bug in ``IntervalIndex`` where performing a scalar lookup fails for included right endpoints of non-overlapping monotonic decreasing indexes (:issue:`16417`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add both issue numbers

value = index[0]

if index.is_monotonic_increasing or index.is_monotonic_decreasing:
assert index._searchsorted_monotonic(value, side='left') == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test .searchsorted() as well for the various cases

@@ -67,6 +48,41 @@ def test_getitem_with_scalar(self):
expected = s.iloc[2:5]
tm.assert_series_equal(expected, s[s >= 2])

@pytest.mark.parametrize('direction, closed',
product(('increasing', 'decreasing'),
('left', 'right', 'neither', 'both')))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zfrenchee if you'd have a look here

@jschendel jschendel force-pushed the fix-idx-searchsortedmonotonic branch 2 times, most recently from 38b4971 to 361face Compare August 25, 2017 05:06
@jschendel
Copy link
Member Author

Made the requested changes. A couple comments:

  • Only tested .searchsorted for monotonic increasing, since that's the underlying assumption.
  • Added an index with repeats to test_base to better test side='right' for _searchsorted_monotonic and .searchsorted.
    • Required a modifying test_get_indexer_consistency to check uniqueness in one place where uniqueness was assumed.
  • Moved the xfail for test_intersection_base to only occur for RangeIndex, since that's the only type of index that should fail.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor request. lgtm. ping on green.

if index.is_unique:
indexer = index.get_indexer(index[0:2])
assert isinstance(indexer, np.ndarray)
assert indexer.dtype == np.intp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add what the result if not index.is_unique as well

@jschendel
Copy link
Member Author

ping @jreback : checks are green

Made one small additional change to the index.is_unique case. Since get_indexer is implemented slightly differently for CategoricalIndex relative to other indexes, they're actually fine even if the values in the index aren't unique, so they fell into the index.is_unique case regardless of uniqueness.

@jschendel jschendel force-pushed the fix-idx-searchsortedmonotonic branch from 85ca734 to 1911860 Compare August 29, 2017 15:34
@jschendel
Copy link
Member Author

ping @jreback : green again after rebasing to address the merge conflict.

Fixed an issue where Index._searchsorted_monotonic(..., side='right')
returns the left side position for monotonic decreasing indexes.  Issue had a downstream effect on scalar lookups in IntervalIndex as well.
Removed xfail, as the issue impacting the failing test has been resolved.  Rebase to address merge conflicts.
@jschendel jschendel force-pushed the fix-idx-searchsortedmonotonic branch from 1911860 to 41e59d7 Compare September 7, 2017 02:37
@jreback jreback merged commit 3a291bb into pandas-dev:master Sep 7, 2017
@jreback
Copy link
Contributor

jreback commented Sep 7, 2017

thanks @jschendel nice fixes! keep em coming!

@jschendel jschendel deleted the fix-idx-searchsortedmonotonic branch September 7, 2017 14:30
jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Sep 10, 2017
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
indexer = index.get_indexer(index[0:2])
assert isinstance(indexer, np.ndarray)
assert indexer.dtype == np.intp
if index.is_unique or isinstance(index, CategoricalIndex):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jschendel why is CategoricalIndex excluded here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
4 participants