Slicing with reversed datetime index #14316

boncelet · 2016-09-28T21:32:17Z

I'm working with a time series data file from EIA.gov. It's a CSV file with the dates in reverse order (most recent first). Slicing on the dates wasn't working right. I believe these snippets illustrate the problem:

Create dataframe and date reversed dataframe.

>>> import pandas as pd
>>> dates = pd.date_range('1/1/2016',periods = 5)
>>> df = pd.DataFrame([1,2,3,4,5], index=dates, columns=['Col'])
>>> rdf = df.sort_index(axis=0, ascending=False)

First the normal order: Slice on dates, getting what we expect:

>>> print(df['2016-01-01':'2016-01-03'])
            Col
2016-01-01    1
2016-01-02    2
2016-01-03    3

Now the reverse order: I expect to get exactly the same answer (same slice on the same dataset), perhaps with rows reversed, but don't:

>>> print(rdf['2016-01-01':'2016-01-03'])
Empty DataFrame
Columns: [Col]
Index: []

Perhaps if I reverse the slice. Note only two rows are returned:

>>> print(rdf['2016-01-03':'2016-01-01'])
            Col
2016-01-03    3
2016-01-02    2

This makes sense, but it's still confusing:

>>> print(df[1:3]), print(rdf[1:3])
            Col
2016-01-02    2
2016-01-03    3
            Col
2016-01-04    4
2016-01-03    3

I could sort (or reverse) my index, but I had no reason to believe I needed to. Furthermore, other operations, such as plotting, work as expected (the time axis is correct).

Output of `pd.show_versions()`

## INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 23 Stepping 6, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.3.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

boncelet · 2016-09-28T21:40:19Z

Sorry to comment on my own issue, but here are two more examples. I get that the rows are different orders, but shouldn't both return the same number of rows?

print(rdf['2016-01-01':'2016-01-03':-1])
            Col
2016-01-02    2
2016-01-03    3

print(df['2016-01-03':'2016-01-01':-1])
            Col
2016-01-03    3
2016-01-02    2
2016-01-01    1

shoyer · 2016-09-28T22:10:23Z

Now the reverse order: I expect to get exactly the same answer (same slice on the same dataset), perhaps with rows reversed, but don't:
>> print(rdf['2016-01-01':'2016-01-03'])
Empty DataFrame
Columns: [Col]
Index: []

This part at least is working as intended. Slicing a descending order index requires a descending slice. This mirrors how slicing an ascending order index requires an ascending slice.

Datetime indexing with strings involves some implicit rounding to match all appropriate times. It's possible that the logic is not quite right for reversed indexes.

boncelet · 2016-09-28T22:40:21Z

The reason I said " I expect to get exactly the same answer (same slice on the same dataset)" is that I expected a slice to behave like a query on a database: give me all rows whose dates are in the range (inclusive). Clearly, the implementation will be much faster if the index is ordered (internally), but the query (slice) should still work as "expected" even if not.

shoyer · 2016-09-28T22:44:37Z

I expected a slice to behave like a query on a database: give me all rows whose dates are in the range (inclusive).

This is a reasonable intuition, but it's the wrong mental model for slicing in pandas. A better model is: give me the continuous slice of the data enclosed by these bounds. Slicing in pandas doesn't work on indexes that are not either monotonic increasing or monotonic decreasing.

chris-b1 · 2016-09-28T23:24:52Z

Should label slicing with a step maybe just raise? I might be missing a valid usecase, but I'm not even sure if these are bugs, because I don't really know what to expect.

In [8]: rdf['2016-01-01':'2016-01-03':-1]
Out[8]: 
            Col
2016-01-02    2
2016-01-03    3

In [9]: df['2016-01-03':'2016-01-01':-1]
Out[9]: 
            Col
2016-01-03    3
2016-01-02    2
2016-01-01    1

chris-b1 · 2016-09-28T23:31:59Z

I guess there are tests here covering this behavior, so it is well defined, if a little strange.

https://github.com/pydata/pandas/blob/master/pandas/tests/indexing/test_indexing.py#L5140

chris-b1 · 2016-09-28T23:38:02Z

@boncelet - of these cases, the only one that looks buggy to me is this one - it should include the ending label. The others seem to just be symptoms of slicing semantics like @shoyer mentioned. Thanks for the report.

In [22]: rdf.loc['2016-01-03':'2016-01-01', :]
Out[22]: 
            Col
2016-01-03    3
2016-01-02    2

shoyer · 2016-09-28T23:50:54Z

I think the issue is in _maybe_cast_slice_bound. It needs to flip side for monotonic decreasing, like _searchsorted_monotonic

boncelet · 2016-09-29T00:00:41Z

Thanks. Part of my concern is it wasn't obvious to me I needed to worry about the order of the data in the original CSV file. Same data, whether it's sorted first to last or last to first.

It turns out I was wrong about plots. At least some plots give the x-axis (time) reversed. Normally this is not what we want.

Unfortunately, this slicing (and plotting) behavior means user code has to check the order of the data and sort it if necessary. For modest datasets (like those I'm looking at) no problem. But I can imagine huge datasets where sorting will be a burden.

It might be nice to have a user settable flag to reverse the index without actually reversing the index, analogously to a numpy stride (which can be negative). That way we can assume the index is increasing even if internally it is decreasing. It'd be even nicer if the various dataframe creators set the flag automatically, thereby relieving the user of the necessity to check the order.

Sincerely, thanks to all you developers who've worked so hard to create useful tools like pandas.

shoyer added Bug Datetime Datetime data dtype labels Sep 28, 2016

chris-b1 mentioned this issue Sep 29, 2016

BUG: date slicing with reverse sorted index #14317

Closed

4 tasks

jreback added this to the 0.19.0 milestone Sep 29, 2016

jreback closed this as completed in ad92aee Sep 30, 2016

rgommers mentioned this issue Sep 16, 2018

Should sel with slice objects care about underlying coordinate order? pydata/xarray#1613

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slicing with reversed datetime index #14316

Slicing with reversed datetime index #14316

boncelet commented Sep 28, 2016

boncelet commented Sep 28, 2016

shoyer commented Sep 28, 2016

boncelet commented Sep 28, 2016

shoyer commented Sep 28, 2016

chris-b1 commented Sep 28, 2016

chris-b1 commented Sep 28, 2016

chris-b1 commented Sep 28, 2016

shoyer commented Sep 28, 2016

boncelet commented Sep 29, 2016

Slicing with reversed datetime index #14316

Slicing with reversed datetime index #14316

Comments

boncelet commented Sep 28, 2016

Output of pd.show_versions()

boncelet commented Sep 28, 2016

shoyer commented Sep 28, 2016

boncelet commented Sep 28, 2016

shoyer commented Sep 28, 2016

chris-b1 commented Sep 28, 2016

chris-b1 commented Sep 28, 2016

chris-b1 commented Sep 28, 2016

shoyer commented Sep 28, 2016

boncelet commented Sep 29, 2016

Output of `pd.show_versions()`