-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Non-inclusive partial string indexing on DatetimeIndex #16571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pls read the docs: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#indexing These are the recommended way to query. These by-definition support slicing when used via you are in fact using a convenience method, which only supports boolean indexing, powered by |
Yes, I've read the docs... Currently the .loc slicing is always inclusive at both ends of the range |
I think we're reluctant to add more complexity / options to indexing with Here's an example of how you can achieve it, using lower-level methods: In [24]: import pandas.util.testing as tm
In [25]: ts = tm.makeTimeSeries()
In [26]: ts
Out[26]:
2000-01-03 0.804101
2000-01-04 0.042160
2000-01-05 -0.580078
2000-01-06 0.757864
2000-01-07 -0.349766
2000-01-10 -0.058222
2000-01-11 -0.274172
2000-01-12 -1.539538
2000-01-13 0.505398
2000-01-14 0.665445
2000-01-17 0.998438
...
Freq: B, dtype: float64 Say you want to slice In [27]: lo = ts.index.get_slice_bound("2000-01-04", "left", "loc")
In [28]: hi = ts.index.get_slice_bound("2000-01-10", "left", "loc")
In [29]: ts.iloc[lo:hi]
Out[29]:
2000-01-04 0.042160
2000-01-05 -0.580078
2000-01-06 0.757864
2000-01-07 -0.349766
Freq: B, dtype: float64 |
I think that would be a very nice "advanced indexing trick" for somewhere in the docs. |
Added the example to the original. I think a cookbook example is probably most appropriate? @linar-jether would you mind submitting a PR? |
@TomAugspurger Yea sure, I'll add a cookbook example But as time series data is a major use case for pandas I think something as basic as a range query should be implemented in the API, maybe extend between_time to support timestamp objects instead of just time? And thanks for your response! |
Currently, the only way to query a time series object in the form of
start_time < t < end_time
, is by using a boolean mask array, as usingloc[slice]
includes both the start and end of the range.Querying using a boolean array is very slow compared to a slice on large dataframes, and pretty much makes no sense, as it always returns an array the size of the dataframe when my query only looks at a fraction of the data (which is the reason for indexing).
How does
df.query
perform range queries?Is there another method for closed/open ranges?
I'm looking for something similar to df.between_time that allows to optionally include the start/end
[from @TomAugspurger]
I think we're reluctant to add more complexity / options to indexing with
.loc
and friends, but this would be a good doc example of how to, achieve it using lower-level methods:Say you want to slice
[2000-01-04, 2000-01-10)
(so excluding the right endpoint)The text was updated successfully, but these errors were encountered: