Skip to content

Inconsistent behavior of partial string indexing vs straight indexing for timeseries data #9732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
washougal opened this issue Mar 25, 2015 · 3 comments
Labels
Datetime Datetime data dtype Enhancement Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@washougal
Copy link

I recently ran into an issue where pandas is not handling the "<=" logical operator when comparing dates in an interesting way.

import pandas as pd
rng = pd.date_range('1/1/2011', periods=72, freq='H')
df = pd.DataFrame(rng)
df.columns=['date']

crit = (df['date'] >= '2011-01-01') & (df['date'] <= '2011-01-02')
df.loc[crit, 'New_Col'] = 'First two days?'

Anything after 2011-01-02 00:00:00 is not included in the less than or equal to 2011-01-02. One would expect that less than or equal to mean that entire day. Is this by design, or is this a bug?

@jreback
Copy link
Contributor

jreback commented Mar 26, 2015

So the typical way to thing about how slices are interpreted is [49], where slices are partial string indexing, see here.

This basically expands a partial date to be inclusive of times for that date, e.g. '2011-01-02' is implicity '2011-01-02 11:59:59.999999999'

However for straight indexing, these values are simply converted as is, e.g. '2011-01-02' -> '2011-01-02 00:00:00.000000000'.

I think these should be consistent, so we'll call this a bug. Though this may be a bit tricky to fix, see core/ops._comp_method_SERIES. And prob should only use partial indexing for ge,gt,le,lt.

In [46]: df = pd.DataFrame({'date' : rng, 'value' : range(len(rng))})

In [47]: df2 = df.set_index('date')

In [48]: df2     
Out[48]: 
                     value
date                      
2011-01-01 00:00:00      0
2011-01-01 01:00:00      1
2011-01-01 02:00:00      2
2011-01-01 03:00:00      3
2011-01-01 04:00:00      4
...                    ...
2011-01-03 19:00:00     67
2011-01-03 20:00:00     68
2011-01-03 21:00:00     69
2011-01-03 22:00:00     70
2011-01-03 23:00:00     71

[72 rows x 1 columns]

In [49]: df2['2011-01-01':'2011-01-02']
Out[49]: 
                     value
date                      
2011-01-01 00:00:00      0
2011-01-01 01:00:00      1
2011-01-01 02:00:00      2
2011-01-01 03:00:00      3
2011-01-01 04:00:00      4
...                    ...
2011-01-02 19:00:00     43
2011-01-02 20:00:00     44
2011-01-02 21:00:00     45
2011-01-02 22:00:00     46
2011-01-02 23:00:00     47

[48 rows x 1 columns]

In [50]: df2[(df2.index >= '2011-01-01') & (df2.index <= '2011-01-02')]
Out[50]: 
                     value
date                      
2011-01-01 00:00:00      0
2011-01-01 01:00:00      1
2011-01-01 02:00:00      2
2011-01-01 03:00:00      3
2011-01-01 04:00:00      4
...                    ...
2011-01-01 20:00:00     20
2011-01-01 21:00:00     21
2011-01-01 22:00:00     22
2011-01-01 23:00:00     23
2011-01-02 00:00:00     24

[25 rows x 1 columns]

@jreback jreback added Datetime Datetime data dtype API Design labels Mar 26, 2015
@jreback jreback added this to the Next Major Release milestone Mar 26, 2015
@washougal washougal changed the title Possible bug in timeseries string indexing Inconsistent behavior partial string indexing vs straight indexing for timeseries data Mar 26, 2015
@washougal washougal changed the title Inconsistent behavior partial string indexing vs straight indexing for timeseries data Inconsistent behavior of partial string indexing vs straight indexing for timeseries data Mar 26, 2015
@shoyer
Copy link
Member

shoyer commented Mar 26, 2015

I'm actually a little surprised this works -- you can't compare either Timestamp or np.datetime64 arrays to strings.

But, I agree with @jreback that this is a nice feature and it would be good to fix it.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Enhancement Difficulty Intermediate labels Jun 11, 2015
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jun 26, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jbrockmendel jbrockmendel removed the Numeric Operations Arithmetic, Comparison, and Logical operations label Mar 30, 2023
@mroeschke
Copy link
Member

Seems like there hasn't been much interest in this feature over the years so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Enhancement Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

5 participants