Use from future import annotations (almost?) everywhere? #40533

MarcoGorelli · 2021-03-20T12:45:37Z

This means we can address

type hints are executed at module import time, which is not computationally free.

(from https://www.python.org/dev/peps/pep-0563/), as well as use the more modern syntax of Python

This is easy to automate, just checking there's no downsides to putting this everywhere. Should any files (probably __init__.py, any others?) be excluded? cc @simonjayhawkins

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2021-03-20T12:56:48Z

see #36091 (comment), although I didn't prove whether it does make a difference, just relied on the statement you quoted in the OP.

type hints are executed at module import time, which is not computationally free.

MarcoGorelli · 2021-03-20T13:41:01Z

Guess I need to try to prove it then - is this a fair test?

With future imports:

In [1]: %timeit import pandas
The slowest run took 20.00 times longer than the fastest. This could mean that an intermediate result is being cached.
530 ns ± 919 ns per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [2]: %timeit import pandas
67.5 ns ± 0.556 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Without:

In [1]: %timeit import pandas
The slowest run took 22.01 times longer than the fastest. This could mean that an intermediate result is being cached.
594 ns ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [2]: %timeit import pandas
81.4 ns ± 9.13 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

jreback · 2021-03-20T13:44:37Z

no you can't time multiple times as these are cached

and should be started in a sub process

we have several closed issues about import time - have a look there

MarcoGorelli · 2021-03-20T14:03:12Z

OK thanks - how about

import subprocess
import re
import statistics

if __name__ == '__main__':
    times = []
    for i in range(1_000):
        output = subprocess.run(['python', '-X', 'importtime', '-c', '"import pandas"'], capture_output=True, text=True)
        time = int(re.findall(r'\d+', output.stderr.splitlines()[-1])[1])
        times.append(time)

    print(f'mean: {statistics.mean(times):.2f}')
    print(f'stderr: {statistics.stdev(times)/sqrt(len(times)):.2f}')

?

Timings are:

with future imports everywhere:

mean: 9807.14
stderr: 45.77

on master:

mean: 10408.37
stderr: 62.55

EDIT

edited to report on standard error rather than stddev

MarcoGorelli · 2021-03-21T16:55:34Z

If the above test is fair, then repeating this for 10_000 iterations makes using future annotations everywhere look slightly worse:

Not particularly noticeable, but still, probably OK to not add it everywhere and just add it where necessary (as suggested by @WillAyd in #36091)

also,

> np.mean((master > annotations))

0.4245

Here's the script to generate these arrays:

import subprocess
import re
import statistics
from math import sqrt


if __name__ == '__main__':
    times = []
    for i in range(10_000):
        output = subprocess.run(['python', '-X', 'importtime', '-c', '"import pandas"'], capture_output=True, text=True)
        time = re.findall(r'\d+', output.stderr.splitlines()[-1])[1]
        times.append(f'{time}\n')
        if i % 1000 == 0:
            print(i)
    branch = subprocess.run(['git', 'branch', '--show-current'], capture_output=True, text=True).stdout.strip()
    with open(f'timing_branch_{branch}', 'w') as fd:
        fd.writelines(times)

MarcoGorelli · 2021-03-22T19:25:00Z

I got an answer on StackOverflow which made a point I hadn't considered:

Since postponed evaluation of annotations is mandatory since Python 3.10, any code intended to work with Python 3.10 or later should use from __future__ import annotations for consistency. Functionality that requires prompt evaluation of annotations is simply not valid in Python 3.10+ and must be removed.

@WillAyd @simonjayhawkins - since you'd previously commented on this, any further thoughts? Does the above (ensuring compatibility with 3.10) seem like enough of a reason to import annotations everywhere?

WillAyd · 2021-03-22T21:09:31Z

Do we have a lot of cases in our codebase where the evaluations cannot be postponed currently? The only thing I know of off the top of my head that _may_ be problematic would be our aliasing of types that conflict with method names (ex: bool_t)

…

Sent from my iPhone

On Mar 22, 2021, at 12:25 PM, Marco Gorelli ***@***.***> wrote: I got an answer on StackOverflow which made a point I hadn't considered: Since postponed evaluation of annotations is mandatory since Python 3.10, any code intended to work with Python 3.10 or later should use from __future__ import annotations for consistency. Functionality that requires prompt evaluation of annotations is simply not valid in Python 3.10+ and must be removed. @WillAyd @simonjayhawkins - since you'd previously commented on this, any further thoughts? Does the above (ensuring compatibility with 3.10) seem like enough of a reason to import annotations everywhere? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

MarcoGorelli · 2021-03-22T21:17:15Z

I'm not aware of any place in the codebase where they can't be postponed currently

pandas/core/generic.py (where there's the bool_t trick) already has the future annotations import

WillAyd · 2021-03-22T21:20:01Z

Sorry I’m confused as to why this is required - can we not just leave them as is an let the Python version dictate the technical details of how they are handled? I think we should only add the annotation where it is required to prevent breakage. Sorry for any misunderstanding

…

Sent from my iPhone

On Mar 22, 2021, at 2:17 PM, Marco Gorelli ***@***.***> wrote: I'm not aware of any place in the codebase where they can't be postponed currently pandas/core/generic.py (where there's the bool_t trick) already has the future annotations import — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

MarcoGorelli · 2021-03-22T21:29:20Z

No worries, I'll try to be clearer:

In Python3.10, postponed evaluation of annotations will become the default. If we use from __future__ import annotations, then we force postponed evaluation of annotations already. By using it everywhere, we make sure that we don't introduce anything which will break when postponed evaluation becomes the default in Python3.10.

e.g. if in some file we included something like

from typing import Type
from functools import singledispatch

def non_global_sdp(local_type: Type):
    @singledispatch
    def default(arg): print("default")

    @default.register
    def case_a(arg: local_type): print("case_a")

    return default

cases = non_global_sdp(int)

, which would break in Python3.10 (but not in Python3.7), then we'd have to refactor when we got to Python3.10.
On the other hand, if we had already included from __future__ import annotations, then we would know immediately that it would break and hence could fix it immediately before Python3.10 comes around.

WillAyd · 2021-03-22T21:36:06Z

Oh ok I see your point then. If I am reading your graph correctly it’s not a huge impact to runtime import time, so I’d be fine with it

…

Sent from my iPhone

On Mar 22, 2021, at 2:29 PM, Marco Gorelli ***@***.***> wrote: No worries, I'll try to be clearer: In Python3.10, postponed evaluation of annotations will become the default. If we use from __future__ import annotations, then we force postponed evaluation of annotations already. By using it everywhere, we make sure that we don't introduce anything which will break when postponed evaluation becomes the default in Python3.10. e.g. if in some file we included something like from typing import Type from functools import singledispatch def non_global_sdp(local_type: Type): @singledispatch def default(arg): print("default") @default.register def case_a(arg: local_type): print("case_a") return default cases = non_global_sdp(int) , which would break in Python3.10 (but not in Python3.7), then we'd have to refactor when we got to Python3.10. On the other hand, if we had already included from __future__ import annotations, then we would know immediately that it would break and hence could fix it immediately before Python3.10 comes around. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

pawelrubin · 2021-03-26T18:20:26Z

If I understood correctly, to make sure that the codebase will not stop working correctly when Python3.10 comes around, one could add from __future__ import annotations everywhere.

How about running tests in two modes - the current one and with from __future__ import annotations everywhere - to ensure that postponed evaluation of annotations won't break anything.

MarcoGorelli · 2021-03-26T18:24:15Z

having two copies of the source code wouldn't be feasible to maintain, simply adding the future annotations import to non-__init__.py files should be enough. do you want to submit a PR to do that?

pawelrubin · 2021-03-26T18:44:31Z

I was thinking about a script that would inject from __future__ import annotations before running tests - having two copies of the source code does not seem like a good idea. I think it might be something to consider, since using future annotations everywhere looks slightly worse.

However, simply adding those imports should do the job as well.

Unfortunately, I won't find the time for this at the moment. I'm just sharing thoughts on the issue.

MarcoGorelli added the Typing type annotations, mypy/pyright type checking label Mar 20, 2021

MarcoGorelli added the Needs Discussion Requires discussion from core team before further action label Mar 20, 2021

MarcoGorelli closed this as completed Mar 21, 2021

MarcoGorelli reopened this Mar 22, 2021

MarcoGorelli mentioned this issue Apr 5, 2021

future annotations #40785

Merged

2 tasks

jreback added this to the 1.3 milestone Apr 13, 2021

jreback closed this as completed in #40785 Apr 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use from future import annotations (almost?) everywhere? #40533

Use from future import annotations (almost?) everywhere? #40533

MarcoGorelli commented Mar 20, 2021

simonjayhawkins commented Mar 20, 2021

MarcoGorelli commented Mar 20, 2021 •

edited

Loading

jreback commented Mar 20, 2021

MarcoGorelli commented Mar 20, 2021 •

edited

Loading

MarcoGorelli commented Mar 21, 2021 •

edited

Loading

MarcoGorelli commented Mar 22, 2021

WillAyd commented Mar 22, 2021 via email

MarcoGorelli commented Mar 22, 2021

WillAyd commented Mar 22, 2021 via email

MarcoGorelli commented Mar 22, 2021

WillAyd commented Mar 22, 2021 via email

pawelrubin commented Mar 26, 2021

MarcoGorelli commented Mar 26, 2021

pawelrubin commented Mar 26, 2021 •

edited

Loading

Use from __future__ import annotations (almost?) everywhere? #40533

Use from __future__ import annotations (almost?) everywhere? #40533

Comments

MarcoGorelli commented Mar 20, 2021

simonjayhawkins commented Mar 20, 2021

MarcoGorelli commented Mar 20, 2021 • edited Loading

jreback commented Mar 20, 2021

MarcoGorelli commented Mar 20, 2021 • edited Loading

EDIT

MarcoGorelli commented Mar 21, 2021 • edited Loading

MarcoGorelli commented Mar 22, 2021

WillAyd commented Mar 22, 2021 via email

MarcoGorelli commented Mar 22, 2021

WillAyd commented Mar 22, 2021 via email

MarcoGorelli commented Mar 22, 2021

WillAyd commented Mar 22, 2021 via email

pawelrubin commented Mar 26, 2021

MarcoGorelli commented Mar 26, 2021

pawelrubin commented Mar 26, 2021 • edited Loading

Use from future import annotations (almost?) everywhere? #40533

Use from future import annotations (almost?) everywhere? #40533

MarcoGorelli commented Mar 20, 2021 •

edited

Loading

MarcoGorelli commented Mar 20, 2021 •

edited

Loading

MarcoGorelli commented Mar 21, 2021 •

edited

Loading

pawelrubin commented Mar 26, 2021 •

edited

Loading