-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: slicing with decreasing monotonic indexes #8680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Great addition, I thought that this was already implemented. I think |
return False, None | ||
is_monotonic_inc = 0 | ||
if not is_monotonic_dec: | ||
return False, False, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to short circuit this and return early if both incr and decr monotonoc are 0 (break out of loop)
so each branch should be something like
if is_monotonoc_incr & cur < prev:
....
in the else
if not is_monotonoc_incr & not is_monotonic_decr:
return False, False, False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the existing logic did the same thing, but definitely not as clear
@immerrr Agreed on all those points. I switched |
Looks awesome, thanks! On Thu, Oct 30, 2014 at 3:02 AM, Stephan Hoyer [email protected]
Adam Hughes |
By the way, the current behavior of
I'll add a fix (and tests) when I add support for slicing monotonic decreasing indexes. |
I'd u want to add this fix separately with tests would be ok |
Latest commit adds support for monotonic decreasing indexes and should fix that bug. I ended up just removing the special case for non-unique, monotonic integer indexes, because I couldn't think of any scenarios where that shortcut was valid. Please correct me if I'm mistaken there. |
c9b3da6
to
c1627f6
Compare
One of the tests is still failing. The current version looks like this (I mangled def test_ix_getitem_not_monotonic(self):
d1, d2 = self.ts.index[[5, 15]]
ts2 = self.ts.iloc[lrange(15) + lrange(15, len(self.ts))[::-1]]
self.assertRaises(KeyError, ts2.ix.__getitem__, slice(d1, d2))
self.assertRaises(KeyError, ts2.ix.__setitem__, slice(d1, d2), 0) I think this test should not have been passing before (not quite sure how it was), because We don't make those sort of guarantees for Thoughts? |
@@ -1461,7 +1461,7 @@ def xs(self, key, axis=0, level=None, copy=None, drop_level=True): | |||
name=self.index[loc]) | |||
|
|||
else: | |||
result = self[loc] | |||
result = self.iloc[loc] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was surprising to encounter (needed to change it to fix some tests), but maybe it was there for a reason? I don't think there is any reason for loc
here to do label based rather than integer indexing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be
I have tried to not touch xs recently. it seems s but fragile and multi slicing obviates the need for it anyhow
@jreback thoughts on this test case I want to remove? That is to do on this (besides release notes, etc.). |
no that test is on But the index is unique and monotonic (increasing). It tries to find d1 but KeyError so it raises. I think its correct for both monotonic increase and decrease. You have to have at least 1 end point in the index for it 2 work |
@@ -1521,7 +1521,7 @@ def test_ix_getitem(self): | |||
def test_ix_getitem_not_monotonic(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the failing test (that I think should be removed)
d1 and d2 are in the index, so slicing should not raise a KeyError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind.... I misread the ::2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add a fix
c1627f6
to
c93afd1
Compare
OK, still needs a rebase, but I added what's new notes and tests are passing now. |
ok, looks reasonable. pls do a perf test and post if anything comes up. |
Here are the perf tests that came up 10% slower or more. This all looks like noise to me (would be nice if vbench included an auto-repeat thing like timeit):
any of these look suspicious? |
these r all ok |
can u add some tests if the index had nan/NaT bedded ? (I think will work just verifying) |
Good idea on NA checks, have to admit I'm a little worried about that. Actually, I'm pretty sure the existing code for is_monotonic does not handle NA properly. I guess any NA values should imply not monotonic? I believe Numpy's searchsorted assumes a particular sort order for NaN. On Fri, Oct 31, 2014 at 5:32 PM, jreback [email protected] wrote:
|
I think na should be ignored for monotonic checks? eg it doesn't change anything searchsorted will work but I think u have to trick it a bit - eg see how argmax and such are done (in base.py) |
@jreback OK, I looked into searchsorted with missing values.
And indeed,
So, given that our current implementation is buggy, we could write our own version of searchsorted that skips NA values (at least if the array has any missing values), even though in the worst case scenario is no better than checking every element. But this seems like a lot of work, and given how pathological indexes with NA values are, I question if it is worth the effort. Indeed, even if searchsorted sometimes makes conceptual sense when there are missing values, we would have to come up with a rule for cases where the missing value is a adjacent to the search value, i.e., what should the result of My preferred choice is to simply declare that the presence of NA means an array is not monotonic. I think this is actually a reasonable interpretation (it should be documented), given that the presence of missing values really means the monotonicity is unknown. Users are also likely to assume that monotonicity implies binary search is possible. |
that seems reasonable - and easy enough to detect in is_monotonic_* routines |
Okay, the latest commit should ensure that the presence of NaN or NaT means the index is not monotonic. Include NaT in the same template ended up pretty awkward... perhaps it would be best to write another function specifically for time types? Or perhaps we don't even really need to check NaT, given that |
we had to do this here as well: https://github.com/pydata/pandas/blob/master/pandas/src/generate_code.py#L1194 I think you can just do:
possibly maybe kill perf though |
Turned up a bug for
Added a test and a note to what's new. |
Fixes pandas-dev#7640, pandas-dev#8625 This is a work in progress, but it's far enough along that I'd love to get some feedback. TODOs (more called out in the code): - [ ] documentation + docstrings - [ ] finish the index methods: - [ ] `get_loc` - [ ] `get_indexer` - [ ] `slice_locs` - [ ] comparison operations - [ ] fix `is_monotonic` (pending pandas-dev#8680) - [ ] ensure sorting works - [ ] arithmetic operations (not essential for MVP) - [ ] cythonize the bottlenecks: - [ ] `from_breaks` - [ ] `_data` - [ ] `Interval`? - [ ] `MultiIndex` - [ ] `Categorical`/`cut` - [ ] serialization - [ ] lots more tests CC @jreback @cpcloud @immerrr
@@ -85,7 +85,7 @@ def test_asobject_tolist(self): | |||
def test_minmax(self): | |||
for tz in self.tz: | |||
# monotonic | |||
idx1 = pd.DatetimeIndex([pd.NaT, '2011-01-01', '2011-01-02', | |||
idx1 = pd.DatetimeIndex(['2011-01-01', '2011-01-02', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a test here (and below), with the nat and assertsFalse for monotonic_incr/decr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see test_is_monotonic_na
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
ok, if you'd rebase/squash. (and maybe do a perf sanity check) I think ok to merge. |
Index.is_monotonic will have a performance degradation (still O(n) time) in cases where the Index is decreasing monotonic. If necessary, we could work around this, but I think we can probably get away with this because the fall- back options are much slower and in many cases (e.g., for slice indexing) the next thing we'll want to know is if it's decreasing monotonic, anyways. see GH7860
also: NaN and NaT imply not monotonic
d41610f
to
db6f8fd
Compare
OK, rebased and squashed (I put the first three commits together, which had all passed earlier on Travis pre-squashing). Ran another performance test and everything looks OK! (I found the So I think we are good to merge |
ENH: slicing with decreasing monotonic indexes
thank you sir! (may I have another?) |
UPD: disregard this message, I've misread the code that does searchsorted on monotonically decreasing index. |
@immerrr Nothing is too late if we are convinced things should change! This is on master:
So you argue it should be the other way around? |
Oh, disregard that message, I've misread the code that does searchsorted on monotonically decreasing index. I agree with your example. |
Fixes #7860
The first commit adds
Index.is_monotonic_decreasing
andIndex.is_monotonic_increasing
(alias foris_monotonic
).is_monotonic
will have a performance degradation (still O(n) time) in cases where the Index is decreasing monotonic. If necessary, we could work around this, but I think we can probably get away with this because the fall-back options are much slower and in many cases (e.g., for slice indexing) the next thing we'll want to know is if it's decreasing monotonic, anyways.Next up will be handling indexing monotonic decreasing indexes with reversed slices, e.g.,
s.loc[30:10]
.CC @jreback @hugadams @immerrr (thank you, specialities wiki!)