Skip to content

DatetimeIndex, df.ix[date] and df.ix[[date]] failure #1821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lodagro opened this issue Aug 28, 2012 · 1 comment
Closed

DatetimeIndex, df.ix[date] and df.ix[[date]] failure #1821

lodagro opened this issue Aug 28, 2012 · 1 comment
Assignees
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@lodagro
Copy link
Contributor

lodagro commented Aug 28, 2012

from stackoverflow

The original frame can not be shared, but following code reproduces the issue.

import pandas
import numpy as np
import datetime

# create large list of non periodic datetime
dates = []
sec = datetime.timedelta(seconds=1)
half_sec = sec / 2
d = datetime.datetime(2011, 12, 5, 20, 30)
n = 100000
for i in range(n):
    dates.append(d)
    dates.append(d + sec)
    dates.append(d + sec + half_sec)
    dates.append(d + sec + sec + half_sec)
    d += 3 * sec

# duplicate some values in the list
duplicate_positions = np.random.randint(0, len(dates) - 1, 20)
for p in duplicate_positions:
    dates[p + 1] = dates[p]
print "duplicated values at %s" % str(duplicate_positions)

df = pandas.DataFrame(np.random.randn(len(dates), 4),
                      index=dates,
                      columns=list('ABCD'))

pos = n * 3
timestamp = df.index[pos]
print "timestamp = df.index[%d] = %s" % (pos, str(timestamp))
print "timestamp in index? %s" % str(timestamp in df.index)
print "df.ix[timestamp] = \n%s" % str(df.ix[timestamp])
print "df.ix[[timestamp]] = \n%s" % str(df.ix[[timestamp]])

running this script outputs:

duplicated values at [143779 365504 338898 192427  63827 333772 366805 378282 375136  77760
  59843 215934 173395 185449 310734 184004  48594 221298 348967  86615]
timestamp = df.index[300000] = 2011-12-08 11:00:00
timestamp in index? True
df.ix[timestamp] = 
A    0.959639
B    0.651874
C    1.059355
D   -1.296393
Name: 2011-12-08 11:00:00
df.ix[[timestamp]] = 
Empty DataFrame
Columns: array([A, B, C, D], dtype=object)
Index: <class 'pandas.tseries.index.DatetimeIndex'>
Length: 0, Freq: None, Timezone: None

for n = 1000000 it gets worse

duplicated values at [ 686452 3862593 3747433   63099 2422495 2191536  486238 1632442 2373460
 3266942 3127937 2538658 1405505  739509 1519644 2817907 1005119  755410
 1784244   86211]
timestamp = df.index[3000000] = 2011-12-31 21:30:00
timestamp in index? False
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: Cannot compare Timestamp with 1326116999500000000
@ghost ghost assigned wesm Sep 18, 2012
@wesm wesm closed this as completed in 684e9dd Sep 18, 2012
@wesm
Copy link
Member

wesm commented Sep 18, 2012

Fixed this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

2 participants