Confusing (possibly buggy) IntervalIndex behavior #16316

alexlenail · 2017-05-10T02:18:20Z

In the above, I have a region that I'm querying for with a partially overlapping interval. The query succeeds when the interval is partially overlapping until it doesn't, throwing the key error:

KeyError                                  Traceback (most recent call last)
/Users/alex/Documents/GarNet/venv/lib/python3.6/site-packages/pandas/core/indexing.py in _has_valid_type(self, key, axis)
   1433                 if not ax.contains(key):
-> 1434                     error()
   1435             except TypeError as e:

/Users/alex/Documents/GarNet/venv/lib/python3.6/site-packages/pandas/core/indexing.py in error()
   1428                 raise KeyError("the label [%s] is not in the [%s]" %
-> 1429                                (key, self.obj._get_axis_name(axis)))
   1430 

KeyError: 'the label [(5409951, 5409965]] is not in the [index]'

I think this is particularly confusing because there doesn't seem to be any prominent difference between the locs that succeed and the loc that fails as far as I can tell. I know we had discussed loc's behavior in this context but I'm not sure we came to a conclusion.

By the way, my larger question is about how to find intersections between two IntervalIndex. It seems like the find_intersections function didn't make it into this release @jreback ? Let me know! =]

The text was updated successfully, but these errors were encountered:

jreback · 2017-05-10T11:47:28Z

can't u change the post to show code that constructs a minimal example
pictures are not very useful

alexlenail · 2017-05-10T16:52:54Z

@jreback

Make your dataframe

x = pd.DataFrame(['a'], columns=['col'], index=pd.IntervalIndex.from_tuples([(1, 5)], closed='both'))

Test all these:

x.loc[pd.Interval(1,5)]
x.loc[pd.Interval(-10,10)]
x.loc[pd.Interval(3,5)]
x.loc[pd.Interval(3,4)]
x.loc[pd.Interval(2,4)]
x.loc[pd.Interval(1,4)]

I think that documentation as to what the behavior should be might be helpful. I can't make heads nor tails of what loc thinks it should do in this case, so I can't even tell if this is a bug.

P.S. Still curious as to how you would address my meta-question about find_intersections...

jreback · 2017-05-11T23:24:47Z

well docs are here: http://pandas.pydata.org/pandas-docs/stable/advanced.html#intervalindex

Well we treat indexing with an Interval as an exact match, if its there exactly it matches, otherwise you get a KeyError. Its a a point it can be contained in an interval. Some of this is probably buggy because we didn't have any real examples

what would be helpful is using this example:

In [26]: df = pd.DataFrame(['a', 'b'], columns=['col'], index=pd.IntervalIndex.from_tuples([(1, 5), (7, 8)]))

In [27]: df
Out[27]: 
       col
(1, 5]   a
(7, 8]   b

is to enumerate what those cases about should do semantically. Further having selections like df.loc[3]. And further still what
df.loc[pd.Interval(4,7.5)] would do.

would be really helpful.

jreback · 2017-05-11T23:25:54Z

further I still don't understand what you need from find_intersection a complete example would be helpful. IOW, you have 2 frames as input (show the code) and what you think the output should be.

alexlenail · 2017-05-11T23:39:15Z

@jreback

Sorry, let me include the output from those .locs and explain why I think they might be buggy, to help clarify.

x = pd.DataFrame(['a'], columns=['col'], index=pd.IntervalIndex.from_tuples([(1, 5)], closed='both'))

Intuitively, I want to say that 'a' is from 1 to 5, closed interval. All good here.

x.loc[pd.Interval(1,5)]

"Do I have any intervals from 1 to 5?" Yes. No surprise

x.loc[pd.Interval(-10,10)]

Do I have any intervals from -10 to 10? Yes. Oh, okay, so it's not exact matches. Okay.

x.loc[pd.Interval(3,5)]

Do I have any intervals from 3 to 5? Yes. Okay, that makes sense, it's a partial overlap with [1,5].

x.loc[pd.Interval(3,4)]

Do I have any intervals from 3 to 4? KeyError Wait what?

x.loc[pd.Interval(2,4)]   # KeyError
x.loc[pd.Interval(1,4)]   # KeyError

I think this qualifies as "Confusing (possibly buggy) IntervalIndex behavior" but it might not -- I might just be thinking about this incorrectly. I'm happy to supplement the docs to clarify for people like me if that's the case.

Thanks!

jreback · 2017-05-12T00:10:04Z

was looking for more discussion of whether an Interval needs to be an exact match, or matches if other intervals are fully contained

what happens if it partially overlaps?

alexlenail · 2017-05-12T00:19:09Z

@jreback I think I might see the footprint of the bug. New dataframe: (same as before, just from 10 to 15)

x = pd.DataFrame(['a'], columns=['col'], index=pd.IntervalIndex.from_tuples([(10, 15)], closed='both'))

x.loc[pd.Interval(9,15)]  # works. 

x.loc[pd.Interval(10,15)]  # works. 

x.loc[pd.Interval(11,15)]  # works. 

x.loc[pd.Interval(9,16)]  # works. 

x.loc[pd.Interval(10,16)]  # works. 

x.loc[pd.Interval(11,16)]  # works.

x.loc[pd.Interval(9,14)]  # fails. 

x.loc[pd.Interval(10,14)]  # fails. 

x.loc[pd.Interval(11,14)]  # fails.

alexlenail · 2017-05-12T00:21:53Z

So:


x.loc[pd.Interval(9,14)] # fails.	x.loc[pd.Interval(10,14)] # fails.	x.loc[pd.Interval(11,14)] # fails.
x.loc[pd.Interval(9,15)] # works.	x.loc[pd.Interval(10,15)] # true interval.	x.loc[pd.Interval(11,15)] # works.
x.loc[pd.Interval(9,16)] # works.	x.loc[pd.Interval(10,16)] # works.	x.loc[pd.Interval(11,16)] # works.

jreback · 2017-05-12T10:19:26Z

@zfrenchee I have enough examples, what I want to know is why you think this should work at all.

In [27]: df = pd.DataFrame(['a'], columns=['col'], 
          index=pd.IntervalIndex.from_tuples([(10, 15)], closed='both'))

In [28]: df
Out[28]: 
         col
[10, 15]   a

IOW, take cases and comment on if they should work or raise (KeyError or other)

df.loc[pd.Interval(10,15, closed='both')]
df.loc[pd.Interval(10, 15, closed='right')]
df.loc[pd.Interval(11, 14)]
df.loc[pd.Interval(11, 16)]
df.loc[[pd.Interval(11, 13):pd.Interval(14, 15)]]
df.loc[[pd.Interval(11, 13), pd.Interval(14, 15)]]
df.loc[12]
df.loc[9]
df.loc[9, 12]
df.loc[11:13]

alexlenail · 2017-05-12T17:27:25Z

I think there are two behaviors for loc which I think would make sense: exact match, or any overlap.

So reasonable behavior 1 is:

df = pd.DataFrame(['a'], columns=['col'], 
          index=pd.IntervalIndex.from_tuples([(10, 15)], closed='both'))

df.loc[pd.Interval(10,15, closed='both')]      # the only one that succeeds
df.loc[pd.Interval(10, 15, closed='right')]     # KeyError
df.loc[pd.Interval(11, 14)]     # KeyError
df.loc[pd.Interval(11, 16)]     # KeyError
df.loc[[pd.Interval(11, 13):pd.Interval(14, 15)]]     # KeyError
df.loc[[pd.Interval(11, 13), pd.Interval(14, 15)]]     # KeyError
df.loc[12]     # KeyError
df.loc[9]     # KeyError
df.loc[9, 12]     # KeyError
df.loc[11:13]     # KeyError

As a non-pandas expert, I believe this is most in keeping with what loc currently does on other index types, but I think this also obliterates the reason for having intervals, since you've essentially reduced the interval down to a token which you're selecting for.

The other possible behavior is that loc returns all overlaps:

df = pd.DataFrame(['a'], columns=['col'], 
          index=pd.IntervalIndex.from_tuples([(10, 15)], closed='both'))

df.loc[pd.Interval(10,15, closed='both')]      # a
df.loc[pd.Interval(10, 15, closed='right')]      # a
df.loc[pd.Interval(11, 14)]      # a
df.loc[pd.Interval(11, 16)]      # a
df.loc[[pd.Interval(11, 13):pd.Interval(14, 15)]]      # a, though I'm not totally sure what the semantics of this query are. 
df.loc[[pd.Interval(11, 13), pd.Interval(14, 15)]]      # a
df.loc[12]      # a
df.loc[9]     # None, because if this returned a KeyError the next case would be really hard.
df.loc[9, 12]      # a
df.loc[11:13]      # a

I think what makes most sense is to use loc for the first of these, and define a new special overlap function to implement the second of these, which would have a signature like so:

pd.IntervalIndex().overlaps(other_intervalindex_or_dataframe_with_intervalIndex)

or more likely:

dataframe_with_intervalindex.index.overlaps(other_intervalindex_or_dataframe_with_intervalIndex)

This would return something sensible, like the indices from dataframe_with_intervalindex which overlap intervals in other_intervalindex_or_dataframe_with_intervalIndex. Actually returning the values from both should be left to merge(left_index=True, right_index=True)

Maybe it would be easier to define overlaps like so:

pd.overlaps(intervalIndex1, intervalindex2)

In that case it's a little harder to decide what to return, since you would want it to return the same thing as:

pd.overlaps(intervalIndex2, intervalindex1)

What do you think? @jreback

TomAugspurger · 2017-05-12T18:31:16Z

As a non-pandas expert, I believe this is most in keeping with what loc currently does on other index types

The closest analogy here is probably partial string indexing into Datetimes. We accept .loc['2017'] rather than .loc[pd.Timestamp(2017, ...)]. Stretching the analogy a bit further then, .loc should be "exact" when passed Intervals, and "non-exact" when passed the elements making up the Intervals (ints, strs, whatever). So these both return the same (which is what happens currently)

In [83]: df = pd.DataFrame(['a', 'b'], columns=['col'], index=pd.IntervalIndex.from_tuples([(1, 5), (7, 8)]))
    ...: df
    ...:
Out[83]:
       col
(1, 5]   a
(7, 8]   b

In [61]: df.loc[pd.Interval(1, 5)]
Out[61]:
col    a
Name: (1, 5], dtype: object

In [62]: df.loc[3]
Out[62]:
col    a
Name: (1, 5], dtype: object```

but, then

In [50]: df.loc[pd.Interval(1, 4)]

would raise a KeyError, since it doesn't match exactly.

Users wishing to do indexing by passing an IntervalIndex should use boolean indexing, and use some methods on Intervalndex to assist

For more flexible indexing with iterables of Intervals, I propose we enhance IntervalIndex.contains (or maybe a new method) to accept an Iterable other and return an Iterable[bool] of the same length.

def IntervalIndex.covers(self, other: Iterable) -> Array[bool]:
    """Boolean mask for whether items of `other` overlap with anything `self`.
    Output is the same same shape as `other`"""
    # maybe enhance `.contains` to do this?

def IntervalIndex.covered_by(self, other: Iterable) -> Array[Bool]:
    """Boolean mask for whether items in self overlap with anything in `other`.
    Output is the same shape as `self`"""
    # maybe modify `.isin` to do this?

So to summarize:

df.loc with scalars or lists of nonIntervals we match wherever it's covered by df.index
df.loc with scalar Intervals a list / IntervalIndex of Intervals will match only exactly
Users should use df[df.index.covers(values)] for non-strict matching with iterables of Intervals

shoyer · 2017-05-12T18:47:46Z

So to summarize:
df.loc with scalars or lists of nonIntervals we match wherever it's covered by df.index
df.loc with scalar Intervals a list / IntervalIndex of Intervals will match only exactly
Users should use df[df.index.covers(values)] for non-strict matching with iterables of Intervals

This sounds good to me!

I think the original intent was to match fully contained intervals only, but clearly that logic is not working right. In any case it's certainly better (simpler / more explicit) to switch to requiring specific methods for this functionality. It seems like we need may need at least three methods for handling IntervalIndex/IntervalIndex matches, e.g., IntervalIndex.covers, IntervalIndex.covered_by and IntervalIndex.overlaps. Possibly worth adding these methods to Interval, too (at least Interval.overlaps().

alexlenail · 2017-05-13T14:40:17Z

@TomAugspurger

I think your ideas represent a good compromise. A couple concerns though:

(which is what happens currently)

The current behavior is definitely wrong with respect to what you suggest it should be, so this isn't quite right. Do you agree?
I'm not sure whether we need both covers and covered_by given that they seem to be perfect opposites? a.covers(b) == b.covered_by(a) right? In which case you can just flip the expression and don't need both. Let me know if I'm misreading this.

TomAugspurger · 2017-05-13T15:13:22Z

I could have been wrong about the current behavior. I didn't test that much. For the second point, we might want both methods as a convenience for if you have a list of Intervals, rather than an IntervalIndex. I think your right that they are perfect opposites (and we could implement it that way)

…

On May 13, 2017, at 9:40 AM, Alexander Lenail ***@***.***> wrote: @TomAugspurger I think your ideas represent a good compromise. A couple concerns though: (which is what happens currently) The current behavior is definitely wrong with respect to what you suggest it should be, so this isn't quite right. Do you agree? I'm not sure whether we need both covers and covered_by given that they seem to be perfect opposites? a.covers(b) == b.covered_by(a) right? In which case you can just flip the expression and don't need both. Let me know if I'm misreading this. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

shoyer · 2017-05-13T16:45:58Z

I'm not sure whether we need both covers and covered_by given that they seem to be perfect opposites? a.covers(b) == b.covered_by(a) right? In which case you can just flip the expression and don't need both. Let me know if I'm misreading this.

This depends on how we represent the results of covers. If we represent it as a list of tuples (equivalently a MultiIndex with two levels) giving indices of "all matches" of intervals, we can return all matching intervals in a symmetric way, e.g.,

[[0, 2), [2, 3)].covers([[0, 1), [1, 2), [2, 3)])
-> [(0, 0), (0, 1), (1, 2)]

In this model, len(a.covers(b)) == len([ai.covers(bi) for ai in a for bi in a]) <= len(a) * len(b).

I think the full "all matches" methods could be useful, but for large lists of intervals these will get very long, so there is also something to be said for "any match" methods. These would return at most one result for each element in the first IntervalIndex, i.e.,

[[0, 2), [2, 3)].covers([[0, 1), [1, 2), [2, 3)])
-> [(0, 0), (1, 2)]

Or possibly just 1D integer arrays of indices or a boolean array (giving matches in the first argument), but either way len(a.covers(b)) <= len(a). In this model, we need separate methods.

So I guess this leaves us with up to 5 potential interval indexing methods, which we might call:

covers_matches() and overlap_matches(): these return a pair of integer arrays.
covers_any(), covered_by_any() and overlaps_any(): these return 1D boolean arrays.

If we do indeed add all of the second type of methods, we might do covered_by_matches() for symmetry, even though it's the same as covers_matches() with the arguments flipped.

alexlenail · 2017-05-13T18:14:35Z

@shoyer I have a hard time seeing why you would design the covers function in your first example the way you do -- when would someone not want the data at those indices (i.e. the overlapping data)? I think users will almost invariably want to "dereference" those indices right away, which seems like it might be a little tricky given the return format of tuples. I can't think of a use case, but if you can, I'm open to this idea. If such a use case exists where you would want the indices but not the data at those indices, perhaps it makes sense to include another function which does the "dereferencing" which depends on this function you've proposed (or at least post a recipe in the docs of this function)...

I'm also not sure who would need "any match" who wouldn't want "all match" but I'm more open to this idea. I can't think of utility to it, but it seems very plausible there's some use case I'm not thinking of.

shoyer · 2017-05-13T19:28:07Z

I have a hard time seeing why you would design the covers function in your first example the way you do -- when would someone not want the data at those indices (i.e. the overlapping data)?

This needs to be an IntervalIndex method, so it doesn't have access to the data. That said, I agree that returning a pair of integer arrays giving indices is a better return format than a list of tuples (which is why I suggested it above for covers_matches() and covers_by_matches()).

shoyer · 2017-05-13T19:51:49Z

I'm also not sure who would need "any match" who wouldn't want "all match" but I'm more open to this idea.

I'm also not sure if this would actually be useful to anyone -- I was merely restating @TomAugspurger's proposal. If we can't think of a real use-case, then we certainly shouldn't bother.

It sounds like we want two methods covers and overlaps, for both Interval and IntervalIndex. We could potentially update IntervalIndex.contains as well, but frankly I'm not sure why that method even exists given that it's currently an exact alias of IntervalIndex.__contains__.

These would have type signatures (slightly abusing standard typing notation):

class Interval:
    def covers(self, other: Interval) -> bool
    def covers(self, other: IntervalIndex) -> IntegerArray1D
    def overlaps(self, other: Interval) -> bool
    def overlaps(self, other: IntervalIndex) -> IntegerArray1D

class IntervalIndex:
    def covers(self, other: Interval) -> IntegerArray1D
    def covers(self, other: IntervalIndex) -> Tuple[IntegerArray1D, IntegerArray1D]
    def overlaps(self, other: Interval) -> IntegerArray1D
    def overlaps(self, other: IntervalIndex) -> Tuple[IntegerArray1D, IntegerArray1D]

jreback · 2017-05-13T21:06:59Z

@shoyer

We could potentially update IntervalIndex.contains as well, but frankly I'm not sure why that method even exists given that it's currently an exact alias of IntervalIndex.contains.

IIRC you suggested this actually. And the are not duplicates of each other. .contains() allows integers, which __contains__ is an exact method (meant to implent pd.Interval(...) in interval_index

.contains() is called in .loc indexing, so it pretty convenient.

alexlenail · 2017-05-14T17:02:44Z

@shoyer sorry I might have lost track -- could you re-disambiguate what covers and overlaps each do in the proposed function signatures you posted above? overlaps is "all overlaps" and covers is "any overlaps"?

shoyer · 2017-05-15T00:48:38Z

could you re-disambiguate what covers and overlaps each do in the proposed function signatures you posted above? overlaps is "all overlaps" and covers is "any overlaps"?

An interval covers another interval if all points in the second interval are found in the first interval.

An interval overlaps another interval if there exist any points found in both intervals.

alexlenail · 2017-05-18T03:27:16Z

@jreback @shoyer What would be the timeline for implementing something like this? How would this come about?

jreback · 2017-05-18T10:17:14Z

@zfrenchee the way to make this happen MUCH faster would be to write tests, that xfail. meaning that we except them to work when the feature is fixed. Many many tests would help here, covering the bases, and edge cases. This would cover the user facing results, e.g. what you expect for a certain operation. Much of the raw input is in this issue. This would also help refine the exact API.

diff --git a/pandas/tests/indexing/test_interval.py b/pandas/tests/indexing/test_interval.py
index 2552fc0..5c0922c 100644
--- a/pandas/tests/indexing/test_interval.py
+++ b/pandas/tests/indexing/test_interval.py
@@ -11,6 +11,18 @@ class TestIntervalIndex(object):
     def setup_method(self, method):
         self.s = Series(np.arange(5), IntervalIndex.from_breaks(np.arange(6)))
 
+    @pytest.mark.xfail(reason="new indexing tests for issue 16316")
+    def test_add_cool_new_things(self):
+
+        s = Series(np.arange(2), IntervalIndex.from_tuples([(0, 1), (2, 3)]))
+        result = s.loc[0.5]
+        expected = 0
+        assert result == expected
+
+        result = s.loc[2.5]
+        expected = 1
+        assert result == expected
+
     def test_loc_with_scalar(self):
 
         s = self.s

alexlenail · 2017-05-18T17:10:43Z

@jreback Sounds good, I'm happy to write the tests. One question though: We've discussed both enhancements on this issue, as well as ways the current implementation is "broken". I'll add xfail tests for the enhancements, but should I alter current tests to bring them into line with what we've discussed here. If so, should I also mark them as xfail?

jreback · 2017-05-18T17:12:38Z

yes
best to move 'wrong' test to another function and xfail them

alexlenail · 2017-05-18T17:22:25Z

@jreback would you mind chatting about this over email? I have a couple more questions which don't necessarily make sense to put on this issue. If that's alright, shoot me an email at [email protected]. =)

shoyer · 2017-05-18T18:19:52Z

If you have developer workflow questions, GitHub is still a good place for them -- we can answer questions or point you to existing primers. It's okay if issues go slightly off topic.

…

On Thu, May 18, 2017 at 1:22 PM Alexander Lenail ***@***.***> wrote: @jreback <https://github.com/jreback> would you mind chatting about this over email? I have a couple more questions which don't necessarily make sense to put on this issue. If that's alright, shoot me an email at ***@***.*** =) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#16316 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKS1gT81WLgNG5yDTiFGdnqwmNUn_1Zks5r7H5VgaJpZM4NWGbI> .

alexlenail · 2017-05-18T19:49:19Z

Alright.

In pandas/pandas/tests/indexing/test_interval.py -- will we need to change test_contains ?
There are still a bunch of references to Panel in http://pandas.pydata.org/pandas-docs/stable/indexing.html. Should those be removed?
New Interval / IntervalIndex behavior spec #16386

jreback · 2017-05-18T20:19:08Z

In pandas/pandas/tests/indexing/test_interval.py -- will we need to change test_contains ?

I don't think so, you can certain put what you think if its different

There are still a bunch of references to Panel in http://pandas.pydata.org/pandas-docs/stable/indexing.html. Should those be removed?

no, things stay in the docs until things are actually removed for Panel, though would prob take a PR to remove them for 0.21.0

buyology · 2017-05-18T23:10:02Z

chiming in on this, as we are heavy users of postgres range types and range operators as a powerful abstraction for time series data

as already been mentioned, the key verbs are contains and overlaps both on element and range level and in both directions:

examples from the postgres docs:

Operator	Description	Example	Result
@>	contains range	`int4range(2,4) @> int4range(2,3)`	t
@>	contains element	`'[2011-01-01,2011-03-01)'::tsrange @> '2011-01-10'::timestamp`	t
<@	range is contained by	`int4range(2,4) <@ int4range(1,7)`	t
<@	element is contained by	`42 <@ int4range(1,7)`	f
&&	overlap (have points in common)	`int8range(3,7) && int8range(4,12)`	t
<<	strictly left of	`int8range(1,10) << int8range(100,110)`	t
-\|-	is adjacent to	`numrange(1.1,2.2) -\|- numrange(2.2,3.3)`	t

now that we have Intervals in pandas (very grateful for bringing that feature @jreback!) I have already tinkered around with some mappers for going between Postgres and pandas — maybe that is too db-specific but def have a great interest in seeing more Interval type functionality in Pandas and helping out with this

shoyer · 2017-05-18T23:25:06Z

Thanks @buyology, the Postgres references are very helpful. Honestly, we probably should have consulted them in the original design process.

Looking at this list, it looks like we do indeed need the contains and overlaps functionality (though I'm not sure we can call our methods contains, given that does significantly from the current method). The rest is provided by current functionality, once you pull out lower and upper bounds.

alexlenail · 2017-12-30T22:59:41Z

@shoyer returning to

class Interval:
    def covers(self, other: Interval) -> bool
    def covers(self, other: IntervalIndex) -> IntegerArray1D
    def overlaps(self, other: Interval) -> bool
    def overlaps(self, other: IntervalIndex) -> IntegerArray1D

class IntervalIndex:
    def covers(self, other: Interval) -> IntegerArray1D
    def covers(self, other: IntervalIndex) -> Tuple[IntegerArray1D, IntegerArray1D]
    def overlaps(self, other: Interval) -> IntegerArray1D
    def overlaps(self, other: IntervalIndex) -> Tuple[IntegerArray1D, IntegerArray1D]

What were you thinking w.r.t. the relationship between IntervalIndex.overlaps(interval) and interval.overlaps(IntervalIndex) (and same question for .covers()) ?

shoyer · 2017-12-30T23:12:31Z

What were you thinking w.r.t. the relationship between IntervalIndex.overlaps(interval) and interval.overlaps(IntervalIndex) (and same question for .covers()) ?

It's been a while since I thought about it, but my initial thought would be that interval.overlaps(interval_index) and interval_index.overlaps(interval) would return the same thing.

alexlenail · 2017-12-30T23:36:03Z

@shoyer okay, that's what I did, currently in #18975. Take a look if you get a chance =)

Note @jreback that means the functions

test_interval_covers_intervalIndex

isn't any different from the function

test_intervalIndex_covers_interval

and

test_interval_overlaps_intervalIndex

is the same as

test_intervalIndex_overlaps_interval

in that the should_overlap and should_cover objects are identical.

jreback · 2018-04-11T01:33:12Z

came across this library: https://github.com/AlexandreDecan/python-intervals

looks to have some interesting interval semantics

cc @jschendel

jorisvandenbossche · 2019-06-20T14:59:21Z

@jschendel this might be a possible topic to work on during the sprint (if you are interested of course)?

It would be good to have this indexing behaviour clean-up in 0.25 before 1.0 (as they will break some behaviour I think, not sure we can do with deprecations).

jschendel · 2019-06-20T19:51:53Z

Yeah, seems like a good topic to work on during the sprint. I've done a bit of work on this already but have been a bit lazy on finishing it up.

sinhrks added Interval Interval data type Indexing Related to indexing on series/frames, not to indexes themselves labels May 11, 2017

alexlenail mentioned this issue May 18, 2017

New Interval / IntervalIndex behavior spec #16386

Merged

3 tasks

jreback mentioned this issue May 20, 2017

ENH: .interval accessor #16401

Open

alexlenail mentioned this issue May 21, 2017

DOC: add section on Interval #16400

Open

jorisvandenbossche added this to the 0.21.0 milestone Jun 19, 2017

jorisvandenbossche added the Prio-high label Jun 19, 2017

shoyer mentioned this issue Aug 30, 2017

qcut: Using cut with IntervalIndex provided by qcut producing wrong NaN values #17284

Closed

alexlenail mentioned this issue Sep 25, 2017

Reboot of PR 16386 #17668

Closed

1 task

jreback modified the milestones: 0.21.0, Next Major Release Oct 2, 2017

jorisvandenbossche mentioned this issue Oct 2, 2017

API: change IntervalIndex.contains to work elementwise #17753

Merged

alexlenail mentioned this issue Dec 28, 2017

Add spec for new Interval / IntervalIndex methods: .overlaps(), .covers() #18975

Closed

3 tasks

alexlenail mentioned this issue Apr 2, 2018

[WIP] Implement new Interval / IntervalIndex behavior, and update tests #20587

Closed

3 tasks

This was referenced Feb 2, 2019

BUG: IntervalIndex.get_loc/get_indexer wrong return value / error #25090

Closed

vectorized operations with pd.Series of pd.Interval data #25177

Closed

jschendel mentioned this issue Mar 25, 2019

[BUG] IntervalIndex.get_loc error for decreasing index #25860

Closed

jschendel mentioned this issue Jun 28, 2019

API: Implement new indexing behavior for intervals #27100

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 0.25.0 Jun 28, 2019

jreback closed this as completed in #27100 Jul 2, 2019

Confusing (possibly buggy) IntervalIndex behavior #16316

Confusing (possibly buggy) IntervalIndex behavior #16316

Comments

alexlenail commented May 10, 2017

jreback commented May 10, 2017

alexlenail commented May 10, 2017 • edited Loading

Make your dataframe

Test all these:

jreback commented May 11, 2017

jreback commented May 11, 2017

alexlenail commented May 11, 2017 • edited Loading

jreback commented May 12, 2017

alexlenail commented May 12, 2017

alexlenail commented May 12, 2017

jreback commented May 12, 2017 • edited Loading

alexlenail commented May 12, 2017 • edited Loading

TomAugspurger commented May 12, 2017 • edited Loading

shoyer commented May 12, 2017

alexlenail commented May 13, 2017

TomAugspurger commented May 13, 2017 via email

shoyer commented May 13, 2017

alexlenail commented May 13, 2017 • edited Loading

shoyer commented May 13, 2017

shoyer commented May 13, 2017 • edited Loading

jreback commented May 13, 2017

alexlenail commented May 14, 2017

shoyer commented May 15, 2017

alexlenail commented May 18, 2017

jreback commented May 18, 2017

alexlenail commented May 18, 2017

jreback commented May 18, 2017

alexlenail commented May 18, 2017

shoyer commented May 18, 2017 via email

alexlenail commented May 18, 2017

jreback commented May 18, 2017

buyology commented May 18, 2017

shoyer commented May 18, 2017

alexlenail commented Dec 30, 2017

shoyer commented Dec 30, 2017

alexlenail commented Dec 30, 2017

jreback commented Apr 11, 2018

jorisvandenbossche commented Jun 20, 2019

jschendel commented Jun 20, 2019

alexlenail commented May 10, 2017 •

edited

Loading

alexlenail commented May 11, 2017 •

edited

Loading

jreback commented May 12, 2017 •

edited

Loading

alexlenail commented May 12, 2017 •

edited

Loading

TomAugspurger commented May 12, 2017 •

edited

Loading

alexlenail commented May 13, 2017 •

edited

Loading

shoyer commented May 13, 2017 •

edited

Loading