Skip to content

Incorrect tzinfo check in tslibs.parsing #22234

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Aug 7, 2018 · 6 comments · Fixed by #30394
Closed

Incorrect tzinfo check in tslibs.parsing #22234

jbrockmendel opened this issue Aug 7, 2018 · 6 comments · Fixed by #30394
Labels
Clean Timezones Timezone data dtype
Milestone

Comments

@jbrockmendel
Copy link
Member

https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/tslibs/parsing.pyx#L339

    if isinstance(tzdata, datetime.tzinfo):
                tzinfo = tzdata

If this were ever hit it would raise an error because datetime.tzinfo refers to the tzinfo attached to a datetime object, not the tzinfo class defined in the datetime module.

There's basically no path by which this would be reached because we never pass a tzdata to this function. So two things should happen: fix this to check for the correct object, and either remove unreachable code or make it reachable.

@mroeschke
Copy link
Member

The same check (bug?) exists in dateutil:

https://github.com/dateutil/dateutil/blob/99f5770e7c63aa049b28abe465d7f1cc25b63fd2/dateutil/parser/_parser.py#L1161

Also it looks like tzdata is defined right above this check in the if/else above.

However if there are no conditions where we ever hit this branch, I would vote nix it.

@jbrockmendel
Copy link
Member Author

The same check (bug?) exists in dateutil:

dateutil.parser._parser has import datetime, whereas tslibs.parsing has from cpython.datetime cimport datetime (roughly equivalent to from datetime import datetime)

@mroeschke
Copy link
Member

Ah good catch. Looks like a copy-paste bug then.

But I think this branch will never be hit:

So dateutil_parse where this code lies in only called in one place

parsed, reso = dateutil_parse(date_string, _DEFAULT_DATETIME,
dayfirst=dayfirst, yearfirst=yearfirst)

And a default kwarg tzinfos will therefore always be None and this branch where this code lies will always be False:

if callable(tzinfos) or tzinfos and res.tzname in tzinfos:

@gfyoung gfyoung added Timezones Timezone data dtype Clean labels Aug 8, 2018
@gfyoung
Copy link
Member

gfyoung commented Aug 8, 2018

I would just remove the logic and see if any tests break. That would help to confirm 🙂

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Aug 8, 2018

However if there are no conditions where we ever hit this branch, I would vote nix it.

We briefly discussed at the sprint the idea of correctly parsing some timezone strings where the correct answer is unambiguous (e.g. "CEST" should always be "Central European Summer Time"). If we go down that road, it will be by passing tzinfos as a dict.

i.e. we shouldn't nix all of that code, and as long as we have some of it, it's worth keeping it parallel to the dateutil version (since it is basically copy/pasted)

@mroeschke
Copy link
Member

I see. In that case, makes sense to keep it for now.

To pay it forward though we can at least fix the datetime.tzinfo check and add some comments why that block is not getting hit (and what it will be useful for in the future).

jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Dec 21, 2019
@jreback jreback added this to the 1.0 milestone Dec 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Timezones Timezone data dtype
Projects
None yet
4 participants