Skip to content

PERF: lazify type-check import #28342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 9, 2019
Merged

Conversation

jbrockmendel
Copy link
Member

These imports in io.formats.format take about 1.6ms, out of a total of about 470ms and (7.8ms total for formats.format). So its not massive, but it is easy to avoid and we are running out of lower-hanging fruit.

@@ -1553,7 +1554,7 @@ def _is_dates_only(

def _format_datetime64(
x: Union[NaTType, Timestamp],
tz: Optional[Union[tzfile, tzutc]] = None,
tz: Optional[Union["tzfile", "tzutc"]] = None,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not obvious to me why these only care about dateutil tzinfos and not e.g. stdlib or pytz versions. @simonjayhawkins any idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were the types seen by MonkeyType. I think they are resolving to Any.

i've since abandoned using MonkeyType for a couple of reasons..

MonkeyType uses nominal types and in many cases these resolve to Any due to unfollowed imports.

MonkeyType only adds nominal types whereas we'd probably prefer structural types.

feel free to change or remove. The order of typing priority from high to low should probably match the order used in isort. so I consider these to be low priority type hints.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I would be shocked if these weren't supposed to be tzinfo, will look more closely and update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but if tzinfo is an unfollowed import, it'll just resolve to Any.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what determines whether something is an unfollowed import? tzinfo is stdlib, seems like mypy should know what it is

Copy link
Member

@simonjayhawkins simonjayhawkins Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were the types seen by MonkeyType. I think they are resolving to Any.

Revealed type for tz is Union[Any, dateutil.tz.tz.tzutc, None]

so it is only the dateutil.zoneinfo.tzfile that is unfollowed.

its not obvious to me why these only care about dateutil tzinfos and not e.g. stdlib or pytz versions.

pytz.tzinfo.DstTzInfo is also unknown to mypy.

tzinfo is stdlib, seems like mypy should know what it is

since all three (dateutil.zoneinfo.tzfile, dateutil.tz.tz.tzutc and pytz.tzinfo.DstTzInfo) inherit from datetime.tzinfo, and datetime.tzinfo is known to mypy through the stdlib, then it probably does make sense to use datetime.tzinfo here.

However, within the function, tz is only used in Timestamp(x).tz_convert(tz) and Timestamp(x).tz_localize(tz) and Timestamp resolves to any due to an unfollowed import. (needs stub #28195) so no actual type checking is being performed.

so probably best just to delete the type hint for now.

@jreback jreback added the Performance Memory or execution speed performance label Sep 8, 2019
@jreback jreback added this to the 1.0 milestone Sep 8, 2019
x: Union[NaTType, Timestamp],
tz: Optional[Union["tzfile", "tzutc"]] = None,
nat_rep: str = "NaT",
x: Union[NaTType, Timestamp], tz: Optional[tzinfo] = None, nat_rep: str = "NaT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so your determined not to remove!

in timestamps.pyx:

tz_convert: tz : str, pytz.timezone, dateutil.tz.tzfile or None

tz_localize: tz : str, pytz.timezone, dateutil.tz.tzfile or None

in tz_convert, tz on used in Timestamp constructor:

return Timestamp(self.value, tz=tz, freq=self.freq)

Timestamp: tz : str, pytz.timezone, dateutil.tz.tzfile or None

in tz_localize, tz used in Timestamp constructor and also:

maybe_get_tz(tz): (Maybe) Construct a timezone object from a string. If tz is a string, use
it to construct a timezone object. Otherwise, just return tz.

it'll be alot easier once the libs are annotated, mypy will do the checks for you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at removing, but it seemed weird to have the function have everything but that one thing annotated. I'll defer to you if you think removing is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because in testing, MonkeyType only saw "tzfile", "tzutc", then tzinfo is probably fine. it depends how the lib gets annotated, otherwise mypy will raise errors here if/when we add the stub and add the types.

strictly speaking this function could also take a string, but as used internally MonkeyType didn't see this function being called with that type.

i'm ok with tzinfo as it helps document the function for now.

@jreback jreback merged commit 7d5425f into pandas-dev:master Sep 9, 2019
@jreback
Copy link
Contributor

jreback commented Sep 9, 2019

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the tzimport branch September 9, 2019 14:21
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants