-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Indexing into Series of tz-aware datetime64s fails using __getitem__ #12089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can confirm this bug, also with current master. |
just need a @JackKelly want to do a PR? |
tests can go in the same place as in #12054 |
sure, I'll give it a go now... |
Indexing into Series of tz-aware datetime64s fails using __getitem__
OK, I've attempted the fix. Here's the relevant commit on my fork of Pandas. However, this hasn't fixed the issue and I'm not sure what's best to do. My 'fix' has revealed a new issue. The problem appears to be that, now, when we do In [5]: dates = pd.date_range("2011-01-01", periods=3, tz='utc')
In [6]: dates
Out[6]: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], dtype='datetime64[ns, UTC]', freq='D')
In [7]: series = pd.Series(dates, index=['a', 'b', 'c'])
# Note the lack of timezone:
In [8]: series['a']
Out[8]: Timestamp('2011-01-01 00:00:00')
# But using `loc` we do get the timezone:
In [9]: series.loc['a']
Out[9]: Timestamp('2011-01-01 00:00:00+0000', tz='UTC')
In [10]: series
Out[10]:
a 2011-01-01 00:00:00+00:00
b 2011-01-02 00:00:00+00:00
c 2011-01-03 00:00:00+00:00
dtype: datetime64[ns, UTC]
In [11]: series['a'] == series.loc['a']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-0de902e8919c> in <module>()
----> 1 series['a'] == series.loc['a']
/home/jack/workspace/python/pandas/pandas/tslib.pyx in pandas.tslib._Timestamp.__richcmp__ (pandas/tslib.c:19258)()
971 (type(self).__name__, type(other).__name__))
972
--> 973 self._assert_tzawareness_compat(other)
974 return _cmp_scalar(self.value, ots.value, op)
975
/home/jack/workspace/python/pandas/pandas/tslib.pyx in pandas.tslib._Timestamp._assert_tzawareness_compat (pandas/tslib.c:19638)()
1000 if self.tzinfo is None:
1001 if other.tzinfo is not None:
-> 1002 raise TypeError('Cannot compare tz-naive and tz-aware '
1003 'timestamps')
1004 elif other.tzinfo is None:
TypeError: Cannot compare tz-naive and tz-aware timestamps |
yeh, prob some issues down the path. lmk if you get stuck. |
Hmm, I think this is way over my head to be honest. I'm really not very familiar with Pandas' internals. I have had a quick shot at getting to the bottom of it. Not sure if I've found any bugs or not. Here are my notes: Set up a debugging session in IPython like this: dates_with_tz = pd.date_range("2011-01-01", periods=3, tz="US/Eastern")
s_with_tz = pd.Series(dates_with_tz, index=['a', 'b', 'c'])
%debug s_with_tz['a'] we find that: In ['2011-01-01T05:00:00.000000000+0000' '2011-01-02T05:00:00.000000000+0000'
'2011-01-03T05:00:00.000000000+0000'] i.e. timezone is switched from "US/Eastern" to UTC. I've tried stepping into In [32]: s_with_tz._values._values
Out[32]:
array(['2011-01-01T05:00:00.000000000+0000',
'2011-01-02T05:00:00.000000000+0000',
'2011-01-03T05:00:00.000000000+0000'], dtype='datetime64[ns]') but I'm really not sure! Is In [33]: s_with_tz._values
Out[33]:
DatetimeIndex(['2011-01-01 00:00:00-05:00', '2011-01-02 00:00:00-05:00',
'2011-01-03 00:00:00-05:00'],
dtype='datetime64[ns, US/Eastern]', freq='D')
In [34]: s_with_tz
Out[34]:
a 2011-01-01 00:00:00-05:00
b 2011-01-02 00:00:00-05:00
c 2011-01-03 00:00:00-05:00
dtype: datetime64[ns, US/Eastern] |
thank you @jreback :) |
I'm a huge fan of Pandas. Thanks for all the hard work!
I believe I have stumbled across a small bug in Pandas 0.17.1 which was not present in 0.16.2. Indexing into Series of timezone-aware
datetime64
s fails using__getitem__
but indexing succeeds if thedatetime64
s are timezone-naive. Here is a minimal code example and the exception produced by Pandas 0.17.1:If the dates are timezone-aware then we can access them using
loc
but, as far as I'm aware, we should be able to use__getitem__
in this situation too:However, if the dates are timezone-naive then indexing using
__getitem__
works as expected:So indexing into a
Series
using__getitem__
works if the data is a list of timezone-naivedatetime64
s but indexing fails if thedatetime64
s are timezone-aware.The text was updated successfully, but these errors were encountered: