Skip to content

BUG: Do not round DatetimeIndex nanosecond precision when iterating #19628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 14, 2018

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Feb 10, 2018

This issue was actually very specific to localizing a DatetimeIndex with datetime.timezone.utc

# On master
In [1]: import pandas as pd

In [2]: datetimeindex = pd.DatetimeIndex(["2018-02-08 15:00:00.168456358"], tz='UTC')

In [3]: list(datetimeindex)[0] == datetimeindex[0] # returned False with datetime.timezone.utc
Out[3]: True 

I realize our timezone support relies heavily on pytz and dateutil, but I am curious how much support we have for datetime.timezone objects.

So the actual issue was that datetime.timezone.utc is (rightly) considered a fixed offset, and the code path was calculating the new value with Python datetimes and timedeltas which doesn't yet support nanosecond resolution.

@codecov
Copy link

codecov bot commented Feb 10, 2018

Codecov Report

Merging #19628 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #19628   +/-   ##
=======================================
  Coverage   91.58%   91.58%           
=======================================
  Files         150      150           
  Lines       48806    48806           
=======================================
  Hits        44701    44701           
  Misses       4105     4105
Flag Coverage Δ
#multiple 89.96% <ø> (ø) ⬆️
#single 41.73% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 569bc7a...14b8120. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need comprehensive support for datetime.timezone in order to accept this. IOW you need to add testing in LOTS of places where we test datetutil/pytz

@@ -30,6 +32,10 @@ cdef int64_t NPY_NAT = get_nat()
# ----------------------------------------------------------------------

cdef inline bint is_utc(object tz):
if PY3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a perf issue, you can almost certainly access this in stead at the cython level

from pandas import (DatetimeIndex, Index, date_range, DataFrame,
Timestamp, offsets)

from pandas.util.testing import assert_almost_equal

if PY3:
from datetime import timezone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not idiomatic, see comments above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than put this here, make a fixture in pandas/conftest.py

called datetime_tz_utc to return this and skip if its not PY3

@jreback jreback added the Timezones Timezone data dtype label Feb 10, 2018
@mroeschke mroeschke changed the title Bug: Do not round DatetimeIndex nanosecond precision when iterating BUG: Do not round DatetimeIndex nanosecond precision when iterating Feb 11, 2018
@mroeschke
Copy link
Member Author

I updated the fix and found the real issue (edited in my description above).

if box:
dt = Timestamp(dt)
result[i] = dt
# Python datetime objects do not support nanosecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use tz_convert_tzlocal_to_utc here from conversion.pyx

from pandas import (DatetimeIndex, Index, date_range, DataFrame,
Timestamp, offsets)

from pandas.util.testing import assert_almost_equal

if PY3:
from datetime import timezone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than put this here, make a fixture in pandas/conftest.py

called datetime_tz_utc to return this and skip if its not PY3

@@ -258,6 +264,13 @@ def test_iteration_preserves_tz(self):
assert result._repr_base == expected._repr_base
assert result == expected

@pytest.mark.parametrize('tz', [None, 'UTC', "US/Central", dt_UTC,
dateutil.tz.tzoffset(None, -28800)])
def test_iteration_preserves_nanoseconds(self, tz):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a similar test for pandas/tests/scalar/test_timezones.py

@@ -258,6 +264,13 @@ def test_iteration_preserves_tz(self):
assert result._repr_base == expected._repr_base
assert result == expected

@pytest.mark.parametrize('tz', [None, 'UTC', "US/Central", dt_UTC,
dateutil.tz.tzoffset(None, -28800)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to test_timezones (in same subdir)

def test_iteration_preserves_nanoseconds(self, tz):
# GH 19603
index = pd.DatetimeIndex(["2018-02-08 15:00:00.168456358"], tz=tz)
assert list(index)[0] == index[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check both getitem and iteration

@mroeschke
Copy link
Member Author

Replaced fix with the use of tz_convert_utc_to_tzlocal, added datetime_tz_utc fixture, Also I am not sure what test I should have added to pandas/tests/scalar/test_timezones.py since the issue is with DatetimeIndex.__iter__

Fix the issue within fix_offsets

add addtional dateutil test

Address_edits
@mroeschke
Copy link
Member Author

All green

@jreback jreback added this to the 0.23.0 milestone Feb 14, 2018
@jreback jreback added the Bug label Feb 14, 2018
@jreback jreback merged commit b9bd0d7 into pandas-dev:master Feb 14, 2018
@jreback
Copy link
Contributor

jreback commented Feb 14, 2018

thanks @mroeschke

FYI looking to move tz to be fixtures (e.g. the strings 'UTC', 'Asia/Tokyo') etc....across the test codebase, IOW to have a 'tz' fixture (prob need several slightly different ones)

cc @jbrockmendel

@mroeschke mroeschke deleted the datetimeindex_iter branch February 14, 2018 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DateTimeIndex.__iter__().next() rounds time to microseconds, when timezone aware
2 participants