Non-inclusive partial string indexing on DatetimeIndex #16571

linar-jether · 2017-06-01T12:39:21Z

Currently, the only way to query a time series object in the form of start_time < t < end_time, is by using a boolean mask array, as using loc[slice] includes both the start and end of the range.

Querying using a boolean array is very slow compared to a slice on large dataframes, and pretty much makes no sense, as it always returns an array the size of the dataframe when my query only looks at a fraction of the data (which is the reason for indexing).

How does df.query perform range queries?
Is there another method for closed/open ranges?

I'm looking for something similar to df.between_time that allows to optionally include the start/end

[from @TomAugspurger]
I think we're reluctant to add more complexity / options to indexing with .loc and friends, but this would be a good doc example of how to, achieve it using lower-level methods:

In [24]: import pandas.util.testing as tm

In [25]: ts = tm.makeTimeSeries()

In [26]: ts
Out[26]:
2000-01-03    0.804101
2000-01-04    0.042160
2000-01-05   -0.580078
2000-01-06    0.757864
2000-01-07   -0.349766
2000-01-10   -0.058222
2000-01-11   -0.274172
2000-01-12   -1.539538
2000-01-13    0.505398
2000-01-14    0.665445
2000-01-17    0.998438
...
Freq: B, dtype: float64

Say you want to slice [2000-01-04, 2000-01-10) (so excluding the right endpoint)

In [27]: lo = ts.index.get_slice_bound("2000-01-04", "left", "loc")

In [28]: hi = ts.index.get_slice_bound("2000-01-10", "left", "loc")

In [29]: ts.iloc[lo:hi]
Out[29]:
2000-01-04    0.042160
2000-01-05   -0.580078
2000-01-06    0.757864
2000-01-07   -0.349766
Freq: B, dtype: float64

The text was updated successfully, but these errors were encountered:

jreback · 2017-06-01T13:25:30Z

pls read the docs: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#indexing

These are the recommended way to query. These by-definition support slicing when used via .loc

you are in fact using a convenience method, which only supports boolean indexing, powered by numexpr.

linar-jether · 2017-06-01T14:36:47Z

Yes, I've read the docs...
Note that the issue is about slicing with open/closed ranges, similar to df.between_time

Currently the .loc slicing is always inclusive at both ends of the range

TomAugspurger · 2017-06-01T19:40:34Z

I think we're reluctant to add more complexity / options to indexing with .loc and friends.

Here's an example of how you can achieve it, using lower-level methods:

In [24]: import pandas.util.testing as tm

In [25]: ts = tm.makeTimeSeries()

In [26]: ts
Out[26]:
2000-01-03    0.804101
2000-01-04    0.042160
2000-01-05   -0.580078
2000-01-06    0.757864
2000-01-07   -0.349766
2000-01-10   -0.058222
2000-01-11   -0.274172
2000-01-12   -1.539538
2000-01-13    0.505398
2000-01-14    0.665445
2000-01-17    0.998438
...
Freq: B, dtype: float64

Say you want to slice [2000-01-04, 2000-01-10) (so excluding the right endpoint)

In [27]: lo = ts.index.get_slice_bound("2000-01-04", "left", "loc")

In [28]: hi = ts.index.get_slice_bound("2000-01-10", "left", "loc")

In [29]: ts.iloc[lo:hi]
Out[29]:
2000-01-04    0.042160
2000-01-05   -0.580078
2000-01-06    0.757864
2000-01-07   -0.349766
Freq: B, dtype: float64

jorisvandenbossche · 2017-06-01T20:25:16Z

I think that would be a very nice "advanced indexing trick" for somewhere in the docs.

TomAugspurger · 2017-06-01T20:34:33Z

Added the example to the original.

I think a cookbook example is probably most appropriate? @linar-jether would you mind submitting a PR?

linar-jether · 2017-06-04T08:42:22Z

@TomAugspurger Yea sure, I'll add a cookbook example

But as time series data is a major use case for pandas I think something as basic as a range query should be implemented in the API, maybe extend between_time to support timestamp objects instead of just time?

And thanks for your response!

jreback closed this as completed Jun 1, 2017

jreback added Datetime Datetime data dtype Usage Question Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 1, 2017

jreback added this to the No action milestone Jun 1, 2017

jorisvandenbossche removed the Usage Question label Jun 1, 2017

TomAugspurger changed the title ~~Timeseries range query~~ Non-inclusive partial datetime indexing Jun 1, 2017

TomAugspurger changed the title ~~Non-inclusive partial datetime indexing~~ Non-inclusive partial string indexing on DatetimeIndex Jun 1, 2017

TomAugspurger added Docs Difficulty Novice labels Jun 1, 2017

TomAugspurger reopened this Jun 1, 2017

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

ghost mentioned this issue Jun 26, 2019

Discussion: Why is loc label-based slicing right-inclusive? #27059

Closed

jbrockmendel removed the Effort Low label Oct 21, 2019

mroeschke removed this from the No action milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-inclusive partial string indexing on DatetimeIndex #16571

Non-inclusive partial string indexing on DatetimeIndex #16571

linar-jether commented Jun 1, 2017 •

edited by TomAugspurger

Loading

jreback commented Jun 1, 2017 •

edited

Loading

linar-jether commented Jun 1, 2017

TomAugspurger commented Jun 1, 2017 •

edited

Loading

jorisvandenbossche commented Jun 1, 2017

TomAugspurger commented Jun 1, 2017

linar-jether commented Jun 4, 2017

Non-inclusive partial string indexing on DatetimeIndex #16571

Non-inclusive partial string indexing on DatetimeIndex #16571

Comments

linar-jether commented Jun 1, 2017 • edited by TomAugspurger Loading

jreback commented Jun 1, 2017 • edited Loading

linar-jether commented Jun 1, 2017

TomAugspurger commented Jun 1, 2017 • edited Loading

jorisvandenbossche commented Jun 1, 2017

TomAugspurger commented Jun 1, 2017

linar-jether commented Jun 4, 2017

linar-jether commented Jun 1, 2017 •

edited by TomAugspurger

Loading

jreback commented Jun 1, 2017 •

edited

Loading

TomAugspurger commented Jun 1, 2017 •

edited

Loading