Skip to content

ENH: support zoneinfo tzinfos #46425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 18, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/deps/actions-38-downstream_compat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ dependencies:
- python-dateutil
- numpy
- pytz
- backports.zoneinfo

# optional dependencies
- beautifulsoup4
Expand Down
1 change: 1 addition & 0 deletions ci/deps/actions-38-minimum_versions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ dependencies:
- python-dateutil=2.8.1
- numpy=1.18.5
- pytz=2020.1
- backports.zoneinfo

# optional dependencies, markupsafe for jinja2
- beautifulsoup4=4.8.2
Expand Down
1 change: 1 addition & 0 deletions ci/deps/actions-pypy-38.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ dependencies:
- numpy
- python-dateutil
- pytz
- backports.zoneinfo
11 changes: 6 additions & 5 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ from pandas._libs.tslibs.timezones cimport (
is_fixed_offset,
is_tzlocal,
is_utc,
is_zoneinfo,
maybe_get_tz,
tz_compare,
utc_pytz as UTC,
Expand All @@ -71,7 +72,7 @@ from pandas._libs.tslibs.nattype cimport (
)
from pandas._libs.tslibs.tzconversion cimport (
bisect_right_i8,
tz_convert_utc_to_tzlocal,
tz_convert_utc_to_tz,
tz_localize_to_utc_single,
)

Expand Down Expand Up @@ -555,8 +556,8 @@ cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
# see PEP 495 https://www.python.org/dev/peps/pep-0495/#the-fold-attribute
if is_utc(tz):
pass
elif is_tzlocal(tz):
tz_convert_utc_to_tzlocal(obj.value, tz, &obj.fold)
elif is_tzlocal(tz) or is_zoneinfo(tz):
tz_convert_utc_to_tz(obj.value, tz, &obj.fold)
else:
trans, deltas, typ = get_dst_info(tz)

Expand Down Expand Up @@ -724,8 +725,8 @@ cdef inline void _localize_tso(_TSObject obj, tzinfo tz):
pass
elif obj.value == NPY_NAT:
pass
elif is_tzlocal(tz):
local_val = tz_convert_utc_to_tzlocal(obj.value, tz, &obj.fold)
elif is_tzlocal(tz) or is_zoneinfo(tz):
local_val = tz_convert_utc_to_tz(obj.value, tz, &obj.fold)
dt64_to_dtstruct(local_val, &obj.dts)
else:
# Adjust datetime64 timestamp, recompute datetimestruct
Expand Down
1 change: 1 addition & 0 deletions pandas/_libs/tslibs/timezones.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ cdef tzinfo utc_pytz

cpdef bint is_utc(tzinfo tz)
cdef bint is_tzlocal(tzinfo tz)
cdef bint is_zoneinfo(tzinfo tz)

cdef bint treat_tz_as_pytz(tzinfo tz)

Expand Down
17 changes: 14 additions & 3 deletions pandas/_libs/tslibs/timezones.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ from datetime import (
timedelta,
timezone,
)
from zoneinfo import ZoneInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unconditional import — do you not require support for Python < 3.8?

If so, you should presumably have some more complicated logic here and in the is_zoneinfo logic.

Something like this works well for "is this an instance of X" without forcing the user to import the module that contains X:

https://github.com/pganssle/pytz-deprecation-shim/blob/bf0174adfce698e280c623d17fe8e82961de0c48/src/pytz_deprecation_shim/helpers.py#L11-L30

https://github.com/pganssle/pytz-deprecation-shim/blob/bf0174adfce698e280c623d17fe8e82961de0c48/src/pytz_deprecation_shim/_common.py#L6-L13

Since obviously some object isn't going to be an instance of zoneinfo.ZoneInfo if the interpreter has never imported the zoneinfo module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unconditional import — do you not require support for Python < 3.8?

I think our minimum version is 3.8, but I'm on 3.9 locally and forgot when writing this branch. Will change to try/except.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no leave this

we support >= 3.8

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zoneinfo isnt stdlib until 3.9, so importing unconditionally would require making backports.zoneinfo a hard dependency (and having that in the CI dep file is causing problems AFAICT)

Copy link
Contributor

@pganssle pganssle Mar 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with backports.zoneinfo it needs to be conditional since they're in different namespaces, plus I really like the approach I took in the pytz deprecation shims for checking if a time zone is a pytz zone, where my checks never actually import pytz unless it's already been imported (since obviously if it's never been imported then I know the object isn't a pytz type). That may be overkill, but it's also not that hard to implement, so ⚖️ 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to take a closer look at the pytz deprecation shim. We're getting rid of a lot of pytz usage in 2.0 (#34916) and these days I keep finding ways we could simplify the code even more if we dropped pytz support altogether.


from cpython.datetime cimport (
datetime,
Expand Down Expand Up @@ -41,7 +42,7 @@ cdef int64_t NPY_NAT = get_nat()
cdef tzinfo utc_stdlib = timezone.utc
cdef tzinfo utc_pytz = UTC
cdef tzinfo utc_dateutil_str = dateutil_gettz("UTC") # NB: *not* the same as tzutc()

cdef tzinfo utc_zoneinfo = ZoneInfo("UTC")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably don't want this to be called unconditionally. Practically speaking, this will always exist, but it's not guaranteed by the ZoneInfo API. It should probably be possible to import this file without this succeeding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Is the failure mode something like "UTC" not being present in /usr/share/zoneinfo/?

The user-facing downside of not doing this here is things going through slightly slower code paths. Not the end of the world, but worth avoiding if it doesn't require too much gymnastics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, or if the user is on Windows and doesn't have tzdata installed, which is actually probably a reasonably common failure mode — user doesn't care about tzdata because they're not using zoneinfo, then they import pandas and this constructor fails when trying to import the timezones module.

I imagine it's pretty easy to make the impact of this minimal. One way to do it:

cdef tzinfo utc_zoneinfo = None

cdef bool is_utc_zoneinfo(tzinfo tz):
    global utc_zoneinfo
    if utc_zoneinfo is None:
        try:
            utc_zoneinfo = ZoneInfo("UTC")
        except ZoneInfoNotFoundError:
            return False

    return tz is utc_zoneinfo

Presumably this function will get inlined wherever you call it, and in the "common case" where ZoneInfo("UTC") is easily imported it's going to be two identity checks instead of 1.

If you are very concerned with performance I'd probably be trying to lazy-import zoneinfo in general anyway, in which case you have more "off-roads" to improve performance for people who don't use zoneinfo, though hopefully in the not-too-distant future pandas will switch to using zoneinfo or pytz-deprecation-shim anyway, at which point you'll be fine with eagerly importing.


# ----------------------------------------------------------------------

Expand All @@ -51,9 +52,15 @@ cpdef inline bint is_utc(tzinfo tz):
or tz is utc_stdlib
or isinstance(tz, _dateutil_tzutc)
or tz is utc_dateutil_str
# NB: we are assuming the user does not clear zoneinfo cache
or tz is utc_zoneinfo
)


cdef inline bint is_zoneinfo(tzinfo tz):
return isinstance(tz, ZoneInfo)


cdef inline bint is_tzlocal(tzinfo tz):
return isinstance(tz, _dateutil_tzlocal)

Expand Down Expand Up @@ -210,6 +217,8 @@ cdef inline bint is_fixed_offset(tzinfo tz):
return 1
else:
return 0
elif is_zoneinfo(tz):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why False when the other return values are all 0 or 1?

Also, this isn't quite right. You can tell if a ZoneInfo has a fixed offset by passing None to utcoffset; if it returns a value, that's the fixed offset.

This also works for pytz, dateutil.tz.tzoffset and datetime.timezone. It won't work for dateutil.tz.gettz("UTC") right now, but once the zoneinfo backport is merged, it will.

This whole function can probably be simplified to: return bool(tz.utcoffset(None) is not None), or (to handle the fixed offset tzfile cases for dateutil <= 2.8.2):

if treat_tz_as_dateutil(tz):
    if len(tz._trans_idx) == 0 and len(tz._trans_list) == 0:
        return 1
    else:
        return 0

return 0 if tz.utcoffset(None) is None else 1

Of course, even that will break when the next version of dateutil comes out, so you probably want some version pinning in place (like for python-dateutil < 3.0).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why False when the other return values are all 0 or 1?

No good reason; I'll change it to match.

This whole function can probably be simplified to [...]

Thanks. I'll probably make a dedicated branch to both Do This Right and audit our usages of is_fixed_offset, which I don't think we're very consistent about.

# This also implicitly accepts datetime.timezone objects which are
# considered fixed
return 1
Expand Down Expand Up @@ -264,6 +273,8 @@ cdef object get_dst_info(tzinfo tz):
# e.g. pytz.FixedOffset, matplotlib.dates._UTC,
# psycopg2.tz.FixedOffsetTimezone
num = int(get_utcoffset(tz, None).total_seconds()) * 1_000_000_000
# If we have e.g. ZoneInfo here, the get_utcoffset call will return None,
# so the total_seconds() call will raise AttributeError.
return (np.array([NPY_NAT + 1], dtype=np.int64),
np.array([num], dtype=np.int64),
"unknown")
Expand Down Expand Up @@ -291,13 +302,13 @@ cdef object get_dst_info(tzinfo tz):
# deltas
deltas = np.array([v.offset for v in (
tz._ttinfo_before,) + tz._trans_idx], dtype='i8')
deltas *= 1000000000
deltas *= 1_000_000_000
typ = 'dateutil'

elif is_fixed_offset(tz):
trans = np.array([NPY_NAT + 1], dtype=np.int64)
deltas = np.array([tz._ttinfo_std.offset],
dtype='i8') * 1000000000
dtype='i8') * 1_000_000_000
typ = 'fixed'
else:
# 2018-07-12 this is not reached in the tests, and this case
Expand Down
2 changes: 1 addition & 1 deletion pandas/_libs/tslibs/tzconversion.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ from cpython.datetime cimport tzinfo
from numpy cimport int64_t


cdef int64_t tz_convert_utc_to_tzlocal(
cdef int64_t tz_convert_utc_to_tz(
int64_t utc_val, tzinfo tz, bint* fold=*
) except? -1
cpdef int64_t tz_convert_from_utc_single(int64_t val, tzinfo tz)
Expand Down
34 changes: 18 additions & 16 deletions pandas/_libs/tslibs/tzconversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ from pandas._libs.tslibs.timezones cimport (
is_fixed_offset,
is_tzlocal,
is_utc,
is_zoneinfo,
)


Expand All @@ -60,8 +61,8 @@ cdef int64_t tz_localize_to_utc_single(
elif is_utc(tz) or tz is None:
return val

elif is_tzlocal(tz):
return _tz_convert_tzlocal_utc(val, tz, to_utc=True)
elif is_tzlocal(tz) or is_zoneinfo(tz):
return _tz_localize_using_tzinfo_api(val, tz, to_utc=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be the default behavior, rather than something triggered only for zoneinfo objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im coming around to a similar opinion. Probably for a separate PR since that would involve a lot of code shuffling (attempts to unify the many places we do these checks have been stymied xref #46397, #46246)


elif is_fixed_offset(tz):
# TODO: in this case we should be able to use get_utcoffset,
Expand Down Expand Up @@ -136,13 +137,13 @@ timedelta-like}

result = np.empty(n, dtype=np.int64)

if is_tzlocal(tz):
if is_tzlocal(tz) or is_zoneinfo(tz):
for i in range(n):
v = vals[i]
if v == NPY_NAT:
result[i] = NPY_NAT
else:
result[i] = _tz_convert_tzlocal_utc(v, tz, to_utc=True)
result[i] = _tz_localize_using_tzinfo_api(v, tz, to_utc=True)
return result

# silence false-positive compiler warning
Expand Down Expand Up @@ -402,7 +403,7 @@ cdef ndarray[int64_t] _get_dst_hours(
# ----------------------------------------------------------------------
# Timezone Conversion

cdef int64_t tz_convert_utc_to_tzlocal(
cdef int64_t tz_convert_utc_to_tz(
int64_t utc_val, tzinfo tz, bint* fold=NULL
) except? -1:
"""
Expand All @@ -418,7 +419,7 @@ cdef int64_t tz_convert_utc_to_tzlocal(
-------
local_val : int64_t
"""
return _tz_convert_tzlocal_utc(utc_val, tz, to_utc=False, fold=fold)
return _tz_localize_using_tzinfo_api(utc_val, tz, to_utc=False, fold=fold)


cpdef int64_t tz_convert_from_utc_single(int64_t val, tzinfo tz):
Expand Down Expand Up @@ -448,8 +449,8 @@ cpdef int64_t tz_convert_from_utc_single(int64_t val, tzinfo tz):

if is_utc(tz):
return val
elif is_tzlocal(tz):
return _tz_convert_tzlocal_utc(val, tz, to_utc=False)
elif is_tzlocal(tz) or is_zoneinfo(tz):
return _tz_localize_using_tzinfo_api(val, tz, to_utc=False)
elif is_fixed_offset(tz):
_, deltas, _ = get_dst_info(tz)
delta = deltas[0]
Expand Down Expand Up @@ -515,7 +516,7 @@ cdef const int64_t[:] _tz_convert_from_utc(const int64_t[:] vals, tzinfo tz):

if is_utc(tz) or tz is None:
use_utc = True
elif is_tzlocal(tz):
elif is_tzlocal(tz) or is_zoneinfo(tz):
use_tzlocal = True
else:
trans, deltas, typ = get_dst_info(tz)
Expand All @@ -539,7 +540,7 @@ cdef const int64_t[:] _tz_convert_from_utc(const int64_t[:] vals, tzinfo tz):
# The pattern used in vectorized.pyx checks for use_utc here,
# but we handle that case above.
if use_tzlocal:
converted[i] = _tz_convert_tzlocal_utc(val, tz, to_utc=False)
converted[i] = _tz_localize_using_tzinfo_api(val, tz, to_utc=False)
elif use_fixed:
converted[i] = val + delta
else:
Expand All @@ -551,11 +552,12 @@ cdef const int64_t[:] _tz_convert_from_utc(const int64_t[:] vals, tzinfo tz):

# OSError may be thrown by tzlocal on windows at or close to 1970-01-01
# see https://github.com/pandas-dev/pandas/pull/37591#issuecomment-720628241
cdef int64_t _tz_convert_tzlocal_utc(int64_t val, tzinfo tz, bint to_utc=True,
bint* fold=NULL) except? -1:
cdef int64_t _tz_localize_using_tzinfo_api(
int64_t val, tzinfo tz, bint to_utc=True, bint* fold=NULL
) except? -1:
"""
Convert the i8 representation of a datetime from a tzlocal timezone to
UTC, or vice-versa.
Convert the i8 representation of a datetime from a general-case timezone to
UTC, or vice-versa using the datetime/tzinfo API.

Private, not intended for use outside of tslibs.conversion

Expand All @@ -564,10 +566,10 @@ cdef int64_t _tz_convert_tzlocal_utc(int64_t val, tzinfo tz, bint to_utc=True,
val : int64_t
tz : tzinfo
to_utc : bint
True if converting tzlocal _to_ UTC, False if going the other direction
True if converting _to_ UTC, False if going the other direction.
fold : bint*, default NULL
pointer to fold: whether datetime ends up in a fold or not
after adjustment
after adjustment.
Only passed with to_utc=False.

Returns
Expand Down
23 changes: 12 additions & 11 deletions pandas/_libs/tslibs/vectorized.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,11 @@ from .timezones cimport (
get_dst_info,
is_tzlocal,
is_utc,
is_zoneinfo,
)
from .tzconversion cimport (
bisect_right_i8,
tz_convert_utc_to_tzlocal,
tz_convert_utc_to_tz,
)

# -------------------------------------------------------------------------
Expand Down Expand Up @@ -113,7 +114,7 @@ def ints_to_pydatetime(

if is_utc(tz) or tz is None:
use_utc = True
elif is_tzlocal(tz):
elif is_tzlocal(tz) or is_zoneinfo(tz):
use_tzlocal = True
else:
trans, deltas, typ = get_dst_info(tz)
Expand All @@ -137,7 +138,7 @@ def ints_to_pydatetime(
if use_utc:
local_val = value
elif use_tzlocal:
local_val = tz_convert_utc_to_tzlocal(value, tz)
local_val = tz_convert_utc_to_tz(value, tz)
elif use_fixed:
local_val = value + delta
else:
Expand Down Expand Up @@ -204,7 +205,7 @@ def get_resolution(const int64_t[:] stamps, tzinfo tz=None) -> Resolution:

if is_utc(tz) or tz is None:
use_utc = True
elif is_tzlocal(tz):
elif is_tzlocal(tz) or is_zoneinfo(tz):
use_tzlocal = True
else:
trans, deltas, typ = get_dst_info(tz)
Expand All @@ -223,7 +224,7 @@ def get_resolution(const int64_t[:] stamps, tzinfo tz=None) -> Resolution:
if use_utc:
local_val = stamps[i]
elif use_tzlocal:
local_val = tz_convert_utc_to_tzlocal(stamps[i], tz)
local_val = tz_convert_utc_to_tz(stamps[i], tz)
elif use_fixed:
local_val = stamps[i] + delta
else:
Expand Down Expand Up @@ -270,7 +271,7 @@ cpdef ndarray[int64_t] normalize_i8_timestamps(const int64_t[:] stamps, tzinfo t

if is_utc(tz) or tz is None:
use_utc = True
elif is_tzlocal(tz):
elif is_tzlocal(tz) or is_zoneinfo(tz):
use_tzlocal = True
else:
trans, deltas, typ = get_dst_info(tz)
Expand All @@ -290,7 +291,7 @@ cpdef ndarray[int64_t] normalize_i8_timestamps(const int64_t[:] stamps, tzinfo t
if use_utc:
local_val = stamps[i]
elif use_tzlocal:
local_val = tz_convert_utc_to_tzlocal(stamps[i], tz)
local_val = tz_convert_utc_to_tz(stamps[i], tz)
elif use_fixed:
local_val = stamps[i] + delta
else:
Expand Down Expand Up @@ -332,7 +333,7 @@ def is_date_array_normalized(const int64_t[:] stamps, tzinfo tz=None) -> bool:

if is_utc(tz) or tz is None:
use_utc = True
elif is_tzlocal(tz):
elif is_tzlocal(tz) or is_zoneinfo(tz):
use_tzlocal = True
else:
trans, deltas, typ = get_dst_info(tz)
Expand All @@ -348,7 +349,7 @@ def is_date_array_normalized(const int64_t[:] stamps, tzinfo tz=None) -> bool:
if use_utc:
local_val = stamps[i]
elif use_tzlocal:
local_val = tz_convert_utc_to_tzlocal(stamps[i], tz)
local_val = tz_convert_utc_to_tz(stamps[i], tz)
elif use_fixed:
local_val = stamps[i] + delta
else:
Expand Down Expand Up @@ -380,7 +381,7 @@ def dt64arr_to_periodarr(const int64_t[:] stamps, int freq, tzinfo tz):

if is_utc(tz) or tz is None:
use_utc = True
elif is_tzlocal(tz):
elif is_tzlocal(tz) or is_zoneinfo(tz):
use_tzlocal = True
else:
trans, deltas, typ = get_dst_info(tz)
Expand All @@ -400,7 +401,7 @@ def dt64arr_to_periodarr(const int64_t[:] stamps, int freq, tzinfo tz):
if use_utc:
local_val = stamps[i]
elif use_tzlocal:
local_val = tz_convert_utc_to_tzlocal(stamps[i], tz)
local_val = tz_convert_utc_to_tz(stamps[i], tz)
elif use_fixed:
local_val = stamps[i] + delta
else:
Expand Down
7 changes: 6 additions & 1 deletion pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
from decimal import Decimal
import operator
import os
import zoneinfo

from dateutil.tz import (
tzlocal,
Expand Down Expand Up @@ -1165,6 +1166,8 @@ def iris(datapath):
timezone.utc,
timezone(timedelta(hours=1)),
timezone(timedelta(hours=-1), name="foo"),
zoneinfo.ZoneInfo("US/Pacific"),
zoneinfo.ZoneInfo("UTC"),
]
TIMEZONE_IDS = [repr(i) for i in TIMEZONES]

Expand All @@ -1191,7 +1194,9 @@ def tz_aware_fixture(request):
tz_aware_fixture2 = tz_aware_fixture


@pytest.fixture(params=["utc", "dateutil/UTC", utc, tzutc(), timezone.utc])
@pytest.fixture(
params=["utc", "dateutil/UTC", utc, tzutc(), timezone.utc, zoneinfo.ZoneInfo("UTC")]
)
def utc_fixture(request):
"""
Fixture to provide variants of UTC timezone strings and tzinfo objects.
Expand Down
Loading