Skip to content

ENH: .interval accessor #16401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jreback opened this issue May 20, 2017 · 3 comments
Open

ENH: .interval accessor #16401

jreback opened this issue May 20, 2017 · 3 comments
Labels
Enhancement Interval Interval data type

Comments

@jreback
Copy link
Contributor

jreback commented May 20, 2017

similar to how we work with .dt, .str, and .cat accessors, it might be nice to expose an .interval accessor; in particular this might make nice indexing expressions, xref to #16316

http://stackoverflow.com/questions/44088460/interval-datatype-in-pandas-find-midpoint-left-center-etc/44088970#44088970

In [13]: df = pd.DataFrame({'month': [1, 1, 2, 2], 'distances': range(4), 'value': range(4)})

In [14]: df
Out[14]: 
   distances  month  value
0          0      1      0
1          1      1      1
2          2      2      2
3          3      2      3

In [15]: result = df.groupby(['month', pd.cut(df.distances, 2)]).value.mean()

In [16]: result
Out[16]: 
month  distances    
1      (-0.003, 1.5]    0.5
2      (1.5, 3.0]       2.5
Name: value, dtype: float64

In [17]: pd.IntervalIndex(result.index.get_level_values('distances')).left
Out[17]: Float64Index([-0.003, 1.5], dtype='float64')

In [18]: pd.IntervalIndex(result.index.get_level_values('distances')).right
Out[18]: Float64Index([1.5, 3.0], dtype='float64')

In [19]: pd.IntervalIndex(result.index.get_level_values('distances')).mid
Out[19]: Float64Index([0.7485, 2.25], dtype='float64')
@jreback jreback added this to the Next Major Release milestone May 20, 2017
@jreback
Copy link
Contributor Author

jreback commented May 20, 2017

cc @shoyer @zfrenchee @buyology @TomAugspurger

@jreback
Copy link
Contributor Author

jreback commented May 20, 2017

e.g. this might make a reasonable syntax for indexing

df.loc[df.my_interval_column.interval.overlaps(.....)]
df.loc[df.my_interval_column.interval.contains(....)]

we do with for example now

In [20]: df = pd.DataFrame({'A': pd.date_range('20170101', periods=10), 'value': range(10)})

In [21]: df.loc[df.A.dt.weekday]
Out[21]: 
           A  value
6 2017-01-07      6
0 2017-01-01      0
1 2017-01-02      1
2 2017-01-03      2
3 2017-01-04      3
4 2017-01-05      4
5 2017-01-06      5
6 2017-01-07      6
0 2017-01-01      0
1 2017-01-02      1

In [22]: df.loc[df.A.dt.weekday==2]
Out[22]: 
           A  value
3 2017-01-04      3

In [23]: df.loc[df.A.dt.weekday==1]
Out[23]: 
           A  value
2 2017-01-03      2
9 2017-01-10      9

@jreback jreback modified the milestones: Interesting Issues, Next Major Release May 20, 2017
@jreback
Copy link
Contributor Author

jreback commented Nov 26, 2017

cc @jschendel

@jreback jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Interval Interval data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants