Inconsistent behavior of partial string indexing vs straight indexing for timeseries data #9732

washougal · 2015-03-25T23:19:53Z

I recently ran into an issue where pandas is not handling the "<=" logical operator when comparing dates in an interesting way.

import pandas as pd
rng = pd.date_range('1/1/2011', periods=72, freq='H')
df = pd.DataFrame(rng)
df.columns=['date']

crit = (df['date'] >= '2011-01-01') & (df['date'] <= '2011-01-02')
df.loc[crit, 'New_Col'] = 'First two days?'

Anything after 2011-01-02 00:00:00 is not included in the less than or equal to 2011-01-02. One would expect that less than or equal to mean that entire day. Is this by design, or is this a bug?

The text was updated successfully, but these errors were encountered:

jreback · 2015-03-26T00:09:05Z

So the typical way to thing about how slices are interpreted is [49], where slices are partial string indexing, see here.

This basically expands a partial date to be inclusive of times for that date, e.g. '2011-01-02' is implicity '2011-01-02 11:59:59.999999999'

However for straight indexing, these values are simply converted as is, e.g. '2011-01-02' -> '2011-01-02 00:00:00.000000000'.

I think these should be consistent, so we'll call this a bug. Though this may be a bit tricky to fix, see core/ops._comp_method_SERIES. And prob should only use partial indexing for ge,gt,le,lt.

In [46]: df = pd.DataFrame({'date' : rng, 'value' : range(len(rng))})

In [47]: df2 = df.set_index('date')

In [48]: df2     
Out[48]: 
                     value
date                      
2011-01-01 00:00:00      0
2011-01-01 01:00:00      1
2011-01-01 02:00:00      2
2011-01-01 03:00:00      3
2011-01-01 04:00:00      4
...                    ...
2011-01-03 19:00:00     67
2011-01-03 20:00:00     68
2011-01-03 21:00:00     69
2011-01-03 22:00:00     70
2011-01-03 23:00:00     71

[72 rows x 1 columns]

In [49]: df2['2011-01-01':'2011-01-02']
Out[49]: 
                     value
date                      
2011-01-01 00:00:00      0
2011-01-01 01:00:00      1
2011-01-01 02:00:00      2
2011-01-01 03:00:00      3
2011-01-01 04:00:00      4
...                    ...
2011-01-02 19:00:00     43
2011-01-02 20:00:00     44
2011-01-02 21:00:00     45
2011-01-02 22:00:00     46
2011-01-02 23:00:00     47

[48 rows x 1 columns]

In [50]: df2[(df2.index >= '2011-01-01') & (df2.index <= '2011-01-02')]
Out[50]: 
                     value
date                      
2011-01-01 00:00:00      0
2011-01-01 01:00:00      1
2011-01-01 02:00:00      2
2011-01-01 03:00:00      3
2011-01-01 04:00:00      4
...                    ...
2011-01-01 20:00:00     20
2011-01-01 21:00:00     21
2011-01-01 22:00:00     22
2011-01-01 23:00:00     23
2011-01-02 00:00:00     24

[25 rows x 1 columns]

shoyer · 2015-03-26T00:30:08Z

I'm actually a little surprised this works -- you can't compare either Timestamp or np.datetime64 arrays to strings.

But, I agree with @jreback that this is a nice feature and it would be good to fix it.

mroeschke · 2024-05-31T22:24:49Z

Seems like there hasn't been much interest in this feature over the years so closing

jreback added Datetime Datetime data dtype API Design labels Mar 26, 2015

jreback added this to the Next Major Release milestone Mar 26, 2015

washougal changed the title ~~Possible bug in timeseries string indexing~~ Inconsistent behavior partial string indexing vs straight indexing for timeseries data Mar 26, 2015

washougal changed the title ~~Inconsistent behavior partial string indexing vs straight indexing for timeseries data~~ Inconsistent behavior of partial string indexing vs straight indexing for timeseries data Mar 26, 2015

jreback mentioned this issue Jun 11, 2015

Partial string matching for timestamps with multiindex #10331

Closed

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Enhancement Difficulty Intermediate labels Jun 11, 2015

thrasibule mentioned this issue Oct 29, 2015

datetime slices with multiindex unexplained behaviour #11474

Closed

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

mroeschke removed the API Design label Apr 18, 2021

jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jun 26, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

jbrockmendel removed the Numeric Operations Arithmetic, Comparison, and Logical operations label Mar 30, 2023

mroeschke closed this as completed May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent behavior of partial string indexing vs straight indexing for timeseries data #9732

Inconsistent behavior of partial string indexing vs straight indexing for timeseries data #9732

washougal commented Mar 25, 2015

jreback commented Mar 26, 2015

shoyer commented Mar 26, 2015

mroeschke commented May 31, 2024

Inconsistent behavior of partial string indexing vs straight indexing for timeseries data #9732

Inconsistent behavior of partial string indexing vs straight indexing for timeseries data #9732

Comments

washougal commented Mar 25, 2015

jreback commented Mar 26, 2015

shoyer commented Mar 26, 2015

mroeschke commented May 31, 2024