-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Use from __future__ import annotations (almost?) everywhere? #40533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
see #36091 (comment), although I didn't prove whether it does make a difference, just relied on the statement you quoted in the OP.
|
Guess I need to try to prove it then - is this a fair test? With future imports: In [1]: %timeit import pandas
The slowest run took 20.00 times longer than the fastest. This could mean that an intermediate result is being cached.
530 ns ± 919 ns per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [2]: %timeit import pandas
67.5 ns ± 0.556 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) Without: In [1]: %timeit import pandas
The slowest run took 22.01 times longer than the fastest. This could mean that an intermediate result is being cached.
594 ns ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [2]: %timeit import pandas
81.4 ns ± 9.13 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) |
no you can't time multiple times as these are cached and should be started in a sub process we have several closed issues about import time - have a look there |
OK thanks - how about import subprocess
import re
import statistics
if __name__ == '__main__':
times = []
for i in range(1_000):
output = subprocess.run(['python', '-X', 'importtime', '-c', '"import pandas"'], capture_output=True, text=True)
time = int(re.findall(r'\d+', output.stderr.splitlines()[-1])[1])
times.append(time)
print(f'mean: {statistics.mean(times):.2f}')
print(f'stderr: {statistics.stdev(times)/sqrt(len(times)):.2f}') ? Timings are: with future imports everywhere: mean: 9807.14
stderr: 45.77 on master: mean: 10408.37
stderr: 62.55 EDITedited to report on standard error rather than stddev |
If the above test is fair, then repeating this for 10_000 iterations makes using future annotations everywhere look slightly worse: Not particularly noticeable, but still, probably OK to not add it everywhere and just add it where necessary (as suggested by @WillAyd in #36091) also,
Here's the script to generate these arrays: import subprocess
import re
import statistics
from math import sqrt
if __name__ == '__main__':
times = []
for i in range(10_000):
output = subprocess.run(['python', '-X', 'importtime', '-c', '"import pandas"'], capture_output=True, text=True)
time = re.findall(r'\d+', output.stderr.splitlines()[-1])[1]
times.append(f'{time}\n')
if i % 1000 == 0:
print(i)
branch = subprocess.run(['git', 'branch', '--show-current'], capture_output=True, text=True).stdout.strip()
with open(f'timing_branch_{branch}', 'w') as fd:
fd.writelines(times) |
I got an answer on StackOverflow which made a point I hadn't considered:
@WillAyd @simonjayhawkins - since you'd previously commented on this, any further thoughts? Does the above (ensuring compatibility with 3.10) seem like enough of a reason to import annotations everywhere? |
Do we have a lot of cases in our codebase where the evaluations cannot be postponed currently? The only thing I know of off the top of my head that _may_ be problematic would be our aliasing of types that conflict with method names (ex: bool_t)
…Sent from my iPhone
On Mar 22, 2021, at 12:25 PM, Marco Gorelli ***@***.***> wrote:
I got an answer on StackOverflow which made a point I hadn't considered:
Since postponed evaluation of annotations is mandatory since Python 3.10, any code intended to work with Python 3.10 or later should use from __future__ import annotations for consistency. Functionality that requires prompt evaluation of annotations is simply not valid in Python 3.10+ and must be removed.
@WillAyd @simonjayhawkins - since you'd previously commented on this, any further thoughts? Does the above (ensuring compatibility with 3.10) seem like enough of a reason to import annotations everywhere?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I'm not aware of any place in the codebase where they can't be postponed currently
|
Sorry I’m confused as to why this is required - can we not just leave them as is an let the Python version dictate the technical details of how they are handled? I think we should only add the annotation where it is required to prevent breakage.
Sorry for any misunderstanding
…Sent from my iPhone
On Mar 22, 2021, at 2:17 PM, Marco Gorelli ***@***.***> wrote:
I'm not aware of any place in the codebase where they can't be postponed currently
pandas/core/generic.py (where there's the bool_t trick) already has the future annotations import
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
No worries, I'll try to be clearer: In Python3.10, postponed evaluation of annotations will become the default. If we use e.g. if in some file we included something like from typing import Type
from functools import singledispatch
def non_global_sdp(local_type: Type):
@singledispatch
def default(arg): print("default")
@default.register
def case_a(arg: local_type): print("case_a")
return default
cases = non_global_sdp(int) , which would break in Python3.10 (but not in Python3.7), then we'd have to refactor when we got to Python3.10. |
Oh ok I see your point then. If I am reading your graph correctly it’s not a huge impact to runtime import time, so I’d be fine with it
…Sent from my iPhone
On Mar 22, 2021, at 2:29 PM, Marco Gorelli ***@***.***> wrote:
No worries, I'll try to be clearer:
In Python3.10, postponed evaluation of annotations will become the default. If we use from __future__ import annotations, then we force postponed evaluation of annotations already. By using it everywhere, we make sure that we don't introduce anything which will break when postponed evaluation becomes the default in Python3.10.
e.g. if in some file we included something like
from typing import Type
from functools import singledispatch
def non_global_sdp(local_type: Type):
@singledispatch
def default(arg): print("default")
@default.register
def case_a(arg: local_type): print("case_a")
return default
cases = non_global_sdp(int)
, which would break in Python3.10 (but not in Python3.7), then we'd have to refactor when we got to Python3.10.
On the other hand, if we had already included from __future__ import annotations, then we would know immediately that it would break and hence could fix it immediately before Python3.10 comes around.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
If I understood correctly, to make sure that the codebase will not stop working correctly when Python3.10 comes around, one could add How about running tests in two modes - the current one and with |
having two copies of the source code wouldn't be feasible to maintain, simply adding the future annotations import to non- |
I was thinking about a script that would inject However, simply adding those imports should do the job as well. Unfortunately, I won't find the time for this at the moment. I'm just sharing thoughts on the issue. |
This means we can address
(from https://www.python.org/dev/peps/pep-0563/), as well as use the more modern syntax of Python
This is easy to automate, just checking there's no downsides to putting this everywhere. Should any files (probably
__init__.py
, any others?) be excluded? cc @simonjayhawkinsThe text was updated successfully, but these errors were encountered: