-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CI: Use all available cores in the CI #24028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This sounds interesting. I can take this up next. But not sure where to start exactly? |
Afaik we're running 3 things in the CI that are worth parallelizing (and I think they weren't when I wrote this issue):
The parallelization of running the tests is actually very tricky, I've got #26949 to experiment, but let's forget about it first, focus on the others, and we can take care of it later. The docs are build in the last build in azure if I'm not wrong, and the entry point is For the compilation of pandas, every job is calling Probably the first is pick one of them, see how many jobs are expected to be used, changed the value, and see if the CI is running faster, or what. I think you set up azure for your fork, feel free to experiment there, or directly in pandas, whatever you prefer. |
@jreback and @tomascassidy #18052 has updated the documentation I am unable to find the functionality of -j in the codebase. Can you help me with this? I would need to understand how it is implemented to work on improving the CI performance |
|
@datapythonista what's your take on adding caching to the build process? Have we tried anything in those lines before? matplotlib/matplotlib#1507 |
There was a bit of discussion about it before, and the preference was to keep things simple in the CI. Personally, to make would make sense to build a docker image or something every night with the latest version of the source code and the conda dependencies, so the builds for every PR just download what it's needed on top of that. But not an expert, I may be missing something. The best would be to create an issue, so everybody interested can give their opinion, and share our ideas and experience. Feel free to open it (probably better with a specific proposal so the discussion is more focused). |
@datapythonista it is not a part of setuptools the only other place i see it in whole of internet is in numpy docs. On looking through their code they have a custom implementation for parallel processing. We should be doing the same for "-j to work" |
Here you've got where the But if I'm not wrong, |
I think this was already implemented with the |
If I'm not wrong, both Travis and Azure provide two cores for the execution of jobs (I think in both cases the number does not necessarily need to be 2, and could be higher, or I guess lower, depending on their load).
So far we are just parallelizing the execution of tests, where we are using 2 processes in most cases (except for the tests marked as
single
).We should be able to speed up the CI by detecting the number of cpus available for the job, and parallelize at least (assuming 2 cpus):
-j2
topython setup.py build_ext --inplace
)--num-jobs=2
to./doc/make.py html
)Travis specs: https://docs.travis-ci.com/user/reference/overview/#virtualisation-environment-vs-operating-system
UPDATE:
-j
argument insetup.py build_ext
has any effect. I'd say it's ignored, and the process is always single core.Time to run the tests
not slow and not network
in the CI:1st run:
2nd run:
The text was updated successfully, but these errors were encountered: