Skip to content

Use from __future__ import annotations (almost?) everywhere? #40533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MarcoGorelli opened this issue Mar 20, 2021 · 14 comments · Fixed by #40785
Closed

Use from __future__ import annotations (almost?) everywhere? #40533

MarcoGorelli opened this issue Mar 20, 2021 · 14 comments · Fixed by #40785
Labels
Needs Discussion Requires discussion from core team before further action Typing type annotations, mypy/pyright type checking
Milestone

Comments

@MarcoGorelli
Copy link
Member

This means we can address

type hints are executed at module import time, which is not computationally free.

(from https://www.python.org/dev/peps/pep-0563/), as well as use the more modern syntax of Python


This is easy to automate, just checking there's no downsides to putting this everywhere. Should any files (probably __init__.py, any others?) be excluded? cc @simonjayhawkins

@MarcoGorelli MarcoGorelli added the Typing type annotations, mypy/pyright type checking label Mar 20, 2021
@simonjayhawkins
Copy link
Member

see #36091 (comment), although I didn't prove whether it does make a difference, just relied on the statement you quoted in the OP.

type hints are executed at module import time, which is not computationally free.

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Mar 20, 2021

Guess I need to try to prove it then - is this a fair test?

With future imports:

In [1]: %timeit import pandas
The slowest run took 20.00 times longer than the fastest. This could mean that an intermediate result is being cached.
530 ns ± 919 ns per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [2]: %timeit import pandas
67.5 ns ± 0.556 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Without:

In [1]: %timeit import pandas
The slowest run took 22.01 times longer than the fastest. This could mean that an intermediate result is being cached.
594 ns ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [2]: %timeit import pandas
81.4 ns ± 9.13 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

@jreback
Copy link
Contributor

jreback commented Mar 20, 2021

no you can't time multiple times as these are cached

and should be started in a sub process

we have several closed issues about import time - have a look there

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Mar 20, 2021

OK thanks - how about

import subprocess
import re
import statistics

if __name__ == '__main__':
    times = []
    for i in range(1_000):
        output = subprocess.run(['python', '-X', 'importtime', '-c', '"import pandas"'], capture_output=True, text=True)
        time = int(re.findall(r'\d+', output.stderr.splitlines()[-1])[1])
        times.append(time)

    print(f'mean: {statistics.mean(times):.2f}')
    print(f'stderr: {statistics.stdev(times)/sqrt(len(times)):.2f}')

?

Timings are:

with future imports everywhere:

mean: 9807.14
stderr: 45.77

on master:

mean: 10408.37
stderr: 62.55

EDIT

edited to report on standard error rather than stddev

@MarcoGorelli MarcoGorelli added the Needs Discussion Requires discussion from core team before further action label Mar 20, 2021
@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Mar 21, 2021

If the above test is fair, then repeating this for 10_000 iterations makes using future annotations everywhere look slightly worse:

image

Not particularly noticeable, but still, probably OK to not add it everywhere and just add it where necessary (as suggested by @WillAyd in #36091)


also,

> np.mean((master > annotations))

0.4245

Here's the script to generate these arrays:

import subprocess
import re
import statistics
from math import sqrt


if __name__ == '__main__':
    times = []
    for i in range(10_000):
        output = subprocess.run(['python', '-X', 'importtime', '-c', '"import pandas"'], capture_output=True, text=True)
        time = re.findall(r'\d+', output.stderr.splitlines()[-1])[1]
        times.append(f'{time}\n')
        if i % 1000 == 0:
            print(i)
    branch = subprocess.run(['git', 'branch', '--show-current'], capture_output=True, text=True).stdout.strip()
    with open(f'timing_branch_{branch}', 'w') as fd:
        fd.writelines(times)

@MarcoGorelli
Copy link
Member Author

I got an answer on StackOverflow which made a point I hadn't considered:

Since postponed evaluation of annotations is mandatory since Python 3.10, any code intended to work with Python 3.10 or later should use from __future__ import annotations for consistency. Functionality that requires prompt evaluation of annotations is simply not valid in Python 3.10+ and must be removed.

@WillAyd @simonjayhawkins - since you'd previously commented on this, any further thoughts? Does the above (ensuring compatibility with 3.10) seem like enough of a reason to import annotations everywhere?

@MarcoGorelli MarcoGorelli reopened this Mar 22, 2021
@WillAyd
Copy link
Member

WillAyd commented Mar 22, 2021 via email

@MarcoGorelli
Copy link
Member Author

I'm not aware of any place in the codebase where they can't be postponed currently

pandas/core/generic.py (where there's the bool_t trick) already has the future annotations import

@WillAyd
Copy link
Member

WillAyd commented Mar 22, 2021 via email

@MarcoGorelli
Copy link
Member Author

No worries, I'll try to be clearer:

In Python3.10, postponed evaluation of annotations will become the default. If we use from __future__ import annotations, then we force postponed evaluation of annotations already. By using it everywhere, we make sure that we don't introduce anything which will break when postponed evaluation becomes the default in Python3.10.

e.g. if in some file we included something like

from typing import Type
from functools import singledispatch

def non_global_sdp(local_type: Type):
    @singledispatch
    def default(arg): print("default")

    @default.register
    def case_a(arg: local_type): print("case_a")

    return default

cases = non_global_sdp(int)

, which would break in Python3.10 (but not in Python3.7), then we'd have to refactor when we got to Python3.10.
On the other hand, if we had already included from __future__ import annotations, then we would know immediately that it would break and hence could fix it immediately before Python3.10 comes around.

@WillAyd
Copy link
Member

WillAyd commented Mar 22, 2021 via email

@pawelrubin
Copy link

If I understood correctly, to make sure that the codebase will not stop working correctly when Python3.10 comes around, one could add from __future__ import annotations everywhere.

How about running tests in two modes - the current one and with from __future__ import annotations everywhere - to ensure that postponed evaluation of annotations won't break anything.

@MarcoGorelli
Copy link
Member Author

having two copies of the source code wouldn't be feasible to maintain, simply adding the future annotations import to non-__init__.py files should be enough. do you want to submit a PR to do that?

@pawelrubin
Copy link

pawelrubin commented Mar 26, 2021

I was thinking about a script that would inject from __future__ import annotations before running tests - having two copies of the source code does not seem like a good idea. I think it might be something to consider, since using future annotations everywhere looks slightly worse.

However, simply adding those imports should do the job as well.

Unfortunately, I won't find the time for this at the moment. I'm just sharing thoughts on the issue.

@MarcoGorelli MarcoGorelli mentioned this issue Apr 5, 2021
2 tasks
@jreback jreback added this to the 1.3 milestone Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Discussion Requires discussion from core team before further action Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants