-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API/BUG: DatetimeIndex correctly localizes integer data #21216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/BUG: DatetimeIndex correctly localizes integer data #21216
Conversation
Codecov Report
@@ Coverage Diff @@
## master #21216 +/- ##
==========================================
- Coverage 91.9% 91.89% -0.01%
==========================================
Files 153 153
Lines 49606 49597 -9
==========================================
- Hits 45589 45579 -10
- Misses 4017 4018 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok looks pretty good.
- can you add a whatsnew note (0.24.0)
- can you test with .astype as well
- can you search thru some issue for timezone / DTI. IIRC I think this might solve some (just a guess)
tm.assert_index_equal(i, i2) | ||
assert i.tz.zone == 'US/Eastern' | ||
|
||
i2 = DatetimeIndex(i.tz_localize(None).asi8, dtype=i.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i actually like this construction, can you add a test that does both? (e.g. both uses and does not localize), maybe split it out and parameterize
@pytest.mark.parametrize('klass', [Index, DatetimeIndex]) | ||
@pytest.mark.parametrize('box', [np.array, list]) | ||
def test_constructor_with_int_tz(self, klass, box): | ||
ts = Timestamp('2018-01-01', tz='US/Pacific') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number
@pytest.mark.parametrize("klass", [pd.Index, pd.DatetimeIndex]) | ||
def test_constructor_dtypes_datetime(self, tz_naive_fixture, values, | ||
def test_constructor_dtypes_datetime(self, tz_naive_fixture, attr, utc, | ||
klass): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a bit of commentary on what is tested here
Thanks @jreback. Moved the whatsnew to v0.24, added some additional logic to get |
from pandas.io.formats.format import _is_dates_only | ||
return _is_dates_only(self.values) | ||
return _is_dates_only(self.values) and self.tz is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did this need changing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this refactor, I was getting errors where the repr didn't look correct but the i8 data was correct in the case when a tz was passed in the DatetimeIndex
constructor.
_is_dates_only
tries to:
return a boolean if we are only dates (and don't have a timezone)
but checks for timezones by re-passing self.values
into DatetimeIndex
. I am not sure the historical reason why this was the case, but checking self.tz is None
fixed the repr issue (and seems to achieve what that function was trying to do in the first place)
pandas/core/indexes/datetimes.py
Outdated
else: | ||
# must be integer dtype otherwise | ||
if isinstance(data, Int64Index): | ||
raise TypeError('cannot convert Int64Index->DatetimeIndex') | ||
# assume this data are epoch timestamps | ||
if data.dtype != _INT64_DTYPE: | ||
data = data.astype(np.int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could
data = data.astype(np.int64, copy=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have any tests with a non-int64 dtype, can you add something
@@ -225,6 +225,13 @@ def _check_rng(rng): | |||
_check_rng(rng_eastern) | |||
_check_rng(rng_utc) | |||
|
|||
def test_integer_index_astype_datetimetz_dtype(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use tz_naive_fixture here
@@ -26,25 +27,28 @@ def test_construction_caching(self): | |||
freq='ns')}) | |||
assert df.dttz.dtype.tz.zone == 'US/Eastern' | |||
|
|||
def test_construction_with_alt(self): | |||
|
|||
@pytest.mark.parametrize('kwargs', [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize on tz_aware_fixture (doesn't include None), then parameterize as you are doing
tm.assert_index_equal(i, result) | ||
assert result.tz.zone == 'US/Eastern' | ||
|
||
@pytest.mark.parametrize('kwargs', [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@pytest.mark.parametrize('kwargs', [ | ||
{'tz': 'dtype.tz'}, | ||
{'dtype': 'dtype'}, | ||
{'dtype': 'dtype', 'tz': 'dtype.tz'}]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain how this test is different than the one above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am preserving the construction of tz_localize(None)
that was here before but split it out in its own test as discussed here #21216 (comment)
@@ -469,6 +473,15 @@ def test_constructor_with_non_normalized_pytz(self, tz): | |||
result = DatetimeIndex(['2010'], tz=non_norm_tz) | |||
assert pytz.timezone(tz) is result.tz | |||
|
|||
@pytest.mark.parametrize('klass', [Index, DatetimeIndex]) | |||
@pytest.mark.parametrize('box', [np.array, list]) | |||
def test_constructor_with_int_tz(self, klass, box): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameterize on tz_naive_fixture
looks good. @jschendel @WillAyd if you can have a look if any comments. |
thanks @mroeschke |
@@ -1175,6 +1175,10 @@ def astype(self, dtype, copy=True): | |||
return CategoricalIndex(self.values, name=self.name, dtype=dtype, | |||
copy=copy) | |||
try: | |||
if is_datetime64tz_dtype(dtype): | |||
from pandas.core.indexes.datetimes import DatetimeIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side issue (another PR). The imports inside functions should be as simple as possible, e.g.
from pandas import DatetimeIndex (several occurrences of this)
closes #20997
closes #20964
git diff upstream/master -u -- "*.py" | flake8 --diff
I did a little refactoring of
Datetime.__new__
that converts passed data but made it easier to fix this issue. I had to modify some methods (join, normalize, delete, insert) and tests that weren't assuming integer data weren't epoch timestamps.