tz_compare fails to consider string "UTC" and pytz UTC object equal #23959

TomAugspurger · 2018-11-27T22:33:59Z

Nothing user-facing here, but this surprised me.

In [5]: pd._libs.tslibs.timezones.tz_compare("UTC", pytz.timezone("UTC"))
Out[5]: False

Other timezones behave as expected

In [6]: pd._libs.tslibs.timezones.tz_compare("US/Eastern", pytz.timezone("US/Eastern"))
Out[6]: True

Is this user error, or a bug?

The text was updated successfully, but these errors were encountered:

mroeschke · 2018-11-27T23:46:25Z

This slipped through the cracks when I did some UTC refactoring. We don't rely on "UTC" as a string internally anymore but this may as well work since "US/Eastern". I think this is not getting cast by pytz. I'll put up a PR tonight.

mroeschke · 2018-11-28T05:09:14Z

QQ: Would we expect tz_compare("UTC", dateutil.tz.UTC) to return True? How about tz_compare(pytz.timezone("UTC"), dateutil.tz.UTC)?

TomAugspurger · 2018-11-29T15:20:21Z

I'm not sure, but it probably depends on exactly how we're using tz_compare. I was thinking of it as "are these two timezones the same?" which is a fuzzy concept, but in this case I would say that they're the same.

The docs say we compare "string representations", and they're clearly different

In [10]: str(dateutil.tz.UTC)
Out[10]: 'tzutc()'

In [11]: str(pytz.timezone("UTC"))
Out[11]: 'UTC'

cc @pganssle if you have thoughts.

pganssle · 2018-12-03T16:37:19Z

Sorry, I have somewhat let this fall by the wayside.

Unfortunately, I don't know of any reasonably generic way to compare time zones, and the semantics of timezone aware datetime comparison unfortunately use is rather than __eq__.

The reason for tz_compare is because pandas is working around pytz's workaround for a lack of PEP 495. pytz works by assigning fixed offsets when the tzinfo is attached, which means that you have this:

NYC = pytz.timezone('America/New_York')
dt1 = NYC.localize(datetime(2018, 1, 1))
dt2 = NYC.localize(datetime(2018, 6, 1))
print(dt1.tzinfo is dt2.tzinfo)
# False

I am not aware of whether pytz defines __eq__ for its time zones, but I think there is some reason you cannot just check direct equality of pytz time zones. In any case, the reason why tz_compare exists is that str(dt.tzinfo) for a pytz zone will always return the IANA key for IANA zones independent of which offset is applied, so this is a proxy for for what you really want, which is "Does this offset represent the same time zone".

In any case, you may be able to special-case some logic related to UTC, but unfortunately the semantics are already messed up and you may in fact be creating more weird edge cases and inconsistencies by trying to patch in compatibility between time zone providers, or to try to support string-to-timezone comparisons directly. Hopefully we will be making the str representations of dateutil time zones nicer at some time in the future, but I wouldn't count on it.

pganssle · 2018-12-03T16:41:21Z

By the way, my solution to this that should unravel most of the complicated stuff you are doing around time zones is to create your own IANA time zone implementation. The two representations you are relying on are not designed for your use case, and as a result you make ample use of private methods, which is a much higher maintenance burden than just creating your own time zone implementation.

I think you should somewhat reliably be able to create a mapping between input time zones and the IANA zones they map to, so the extent of your support for third party time zones would be to immediately convert them to an internal pandas TZ representation. This would allow you to drop pytz as a dependency as well.

jreback · 2018-12-03T18:02:12Z

@pganssle

to the extent of your support for third party time zones would be to immediately convert them to an internal pandas TZ representation. This would allow you to drop pytz as a dependency as well.

what would this look like?

pganssle · 2018-12-03T21:59:02Z

what would this look like?

Taking some time to sketch out some sort of prototype is on my list, but unfortunately it's not high on my list. I think the high level overview is:

I resolve Move zoneinfo data to a separate package dateutil/dateutil#701, so that the dateutil zoneinfo file can be a separate dependency of pandas (I think pendulum has a package that's just the tz data, too, but I don't know if it's suitable for general use).
pandas adds a parser for zoneinfo files (this is not terribly difficult to do, especially in Cython
pandas adds a class equivalent to tzinfo (doesn't necessarily have to be tzinfo) that expects to operate on datetime array types, and a function mapping IANA keys to the relevant PandasTzInfo

For strings, the support is built in already. For pytz you can already get the IANA key from any pytz zone by calling str on it, for dateutil it's not possible yet, but that will change soon (PR welcome).

There will be time zones that cannot be translated in this way, but I don't see that this is a big deal, because this is "fast path" time zone handling code. You can have two levels of support. UTC, fixed offset time zones and IANA zones represented as strings, pytz or dateutil types get translated automatically into the internal types when applied to a TZ-aware column. Everything else goes down the "slow path", which is the equivalent of having an object column - time zone handling code just calls the underlying datetime object's time zone functions in a for loop.

If done right, this doesn't actually require a dependency on pytz, because you can either use heuristics to detect pytz zones (or other clever methods to avoid unnecessary imports of pytz).

Most of this does not have to be a breaking change, though dropping pytz will likely need to be a breaking change. That is not a huge part of it, and depending on how strongly you feel about a notice period, I have a few strategies you can employ that would maintain API compatibility but slowly transition over to dateutil.

jbrockmendel · 2020-07-06T19:47:50Z

This is partially fixed as tz_compare now requires tzinfo objects, so the string "UTC" is irrelevant. It's still questionable that tz_compare(pytz.UTC, timezone.utc) gives False.

andrewcooke · 2020-08-07T21:25:53Z

questionable? it just wasted yet another day in my life.

TomAugspurger added the Timezones Timezone data dtype label Nov 27, 2018

mroeschke mentioned this issue Dec 7, 2018

WIP: multi-timezone handling for array_to_datetime #24006

Closed

acowlikeobject mentioned this issue Jan 31, 2019

"TypeError: data is already tz-aware UTC" in get_calendar() quantopian/trading_calendars#45

Closed

jbrockmendel mentioned this issue Jan 1, 2021

REGR: fillna on datetime64[ns, UTC] column hits RecursionError #38851

Closed

jbrockmendel mentioned this issue Jan 16, 2021

API/BUG: treat different UTC tzinfos as equal #39216

Merged

4 tasks

jreback added this to the 1.3 milestone Jan 19, 2021

jreback closed this as completed in #39216 Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tz_compare fails to consider string "UTC" and pytz UTC object equal #23959

tz_compare fails to consider string "UTC" and pytz UTC object equal #23959

TomAugspurger commented Nov 27, 2018

mroeschke commented Nov 27, 2018

mroeschke commented Nov 28, 2018

TomAugspurger commented Nov 29, 2018

pganssle commented Dec 3, 2018

pganssle commented Dec 3, 2018

jreback commented Dec 3, 2018

pganssle commented Dec 3, 2018 •

edited

Loading

jbrockmendel commented Jul 6, 2020

andrewcooke commented Aug 7, 2020

tz_compare fails to consider string "UTC" and pytz UTC object equal #23959

tz_compare fails to consider string "UTC" and pytz UTC object equal #23959

Comments

TomAugspurger commented Nov 27, 2018

mroeschke commented Nov 27, 2018

mroeschke commented Nov 28, 2018

TomAugspurger commented Nov 29, 2018

pganssle commented Dec 3, 2018

pganssle commented Dec 3, 2018

jreback commented Dec 3, 2018

pganssle commented Dec 3, 2018 • edited Loading

jbrockmendel commented Jul 6, 2020

andrewcooke commented Aug 7, 2020

pganssle commented Dec 3, 2018 •

edited

Loading