Skip to content

Not possible to find first element in non-monotonic DateTimeIndex if string is used #13572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tjader opened this issue Jul 6, 2016 · 7 comments
Labels
Datetime Datetime data dtype good first issue Index Related to the Index class or subclasses

Comments

@tjader
Copy link

tjader commented Jul 6, 2016

There is an inconsistency when you look for dates in a DateTimeIndex if you query using strings and the index is non-monotonic. The first element in the index cannot be found in this way, but if you manually convert it to a timestamp object, or if you look for elements other than the first, then the lookup is successful.

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> idx = pd.to_datetime(['2015-01-01', '2015-01-02', '2015-01-05', '2015-01-04'])
>>> idx
Out[38]: DatetimeIndex(['2015-01-01', '2015-01-02', '2015-01-05', '2015-01-04'], dtype='datetime64[ns]', freq=None)
>>> idx.is_monotonic
Out[29]: False
>>> idx = pd.to_datetime(['2015-01-01', '2015-01-02', '2015-01-05', '2015-01-04'])
>>> '2015-01-01' in idx  # This should work
Out[31]: False
>>> pd.Timestamp('2015-01-01') in idx
Out[32]: True
>>> '2015-01-02' in idx
Out[33]: True
>>> '2015-01-04' in idx
Out[34]: True

Expected Output

>>> '2015-01-01' in idx
Out[31]: True

output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-573.7.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7.commod
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: 0.7.2
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.1
dateutil: 2.5.2
pytz: 2016.4
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9.2
apiclient: 1.5.0
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None
@TomAugspurger TomAugspurger added Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 6, 2016
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 6, 2016

Thanks for the report.

The cause seems to be somewhere around here where DatetimeIndex._get_string_slice('2015-01-01') returns a np.array([0]), we return that, which gets np.any called on it, which is False since bool(0) is False.

So the monotonicity is a bit of a red herring, the real problem is that the matching key is in the 0th position.

Interested in submitting a fix? Shouldn't be too bad.

@TomAugspurger TomAugspurger added this to the 0.19.0 milestone Jul 6, 2016
@jreback
Copy link
Contributor

jreback commented Jul 6, 2016

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 6, 2016

@jreback This seems strange though, right (on the original idx)

In [88]: '2015-01-02' in idx
Out[88]: True

In [89]: '2015-01-01' in idx
Out[89]: False

The only difference being that the '2015-01-01' happened to be the first element. I think that we can fix this at least for __contains__ in the non-monotonic case.

@tjader
Copy link
Author

tjader commented Jul 6, 2016

@TomAugspurger The code takes quite a different path if the index is monotonic, starting when a KeyError is raised inside DatetimeIndex._partial_date_slice, so I wouldn't say it's a complete red herring.
Anyway, I think it looks reasonable to swap np.any for np.size in contains instead.

@TomAugspurger
Copy link
Contributor

Right, red herring was probably the wrong phrase to use since non-monotonicity is necessary for the bug to occur 😄

@jreback jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017
@harshit-ag
Copy link

can i work on this issue?

@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 28, 2019
@toobaz
Copy link
Member

toobaz commented Jun 28, 2019

This was apparently fixed:

In [1]: import pandas as pd                                                                                                                                                                                                                   

In [2]: idx = pd.to_datetime(['2015-01-01', '2015-01-02', '2015-01-05', '2015-01-04'])                                                                                                                                                        

In [3]: '2015-01-01' in idx                                                                                                                                                                                                                   
Out[3]: True

Please reopen if it still can be reproduced.

@toobaz toobaz closed this as completed Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype good first issue Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants