Skip to content

Parallel build for CI #30452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 26, 2019
Merged

Parallel build for CI #30452

merged 7 commits into from
Dec 26, 2019

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented Dec 24, 2019

Letting this sit in draft state for a while to see if issues pop up

Note that this doesn't affect Windows (not sure yet if it can). Also previous attempts had -j4 but I believe we only get 2 virtual cores for the CI machines so that was probably overkill.

@jbrockmendel
Copy link
Member

lgtm pending green

@alimcmaster1
Copy link
Member

alimcmaster1 commented Dec 24, 2019

Nice! - I was just about to ask on #30356 why we don't do this in CI

@WillAyd
Copy link
Member Author

WillAyd commented Dec 24, 2019

@alimcmaster1 tried previously in #30214 but had to revert as the Py36 builds would intermittently fail thereafter.

Certainly open to any input you have on this

@WillAyd WillAyd added the CI Continuous Integration label Dec 24, 2019
@jreback jreback added this to the 1.0 milestone Dec 26, 2019
@jreback
Copy link
Contributor

jreback commented Dec 26, 2019

can you rebase again so we have a fresh comparison vs master. can you show the build or CI times (just see a representative sample)

@WillAyd WillAyd marked this pull request as ready for review December 26, 2019 02:03
@@ -121,7 +121,7 @@ conda list pandas
# Make sure any error below is reported as such

echo "[Build extensions]"
python setup.py build_ext -q -i
python setup.py build_ext -q -i -j2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for using 2 processes? I'd set -j0 and use as many processes are cores available.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documented somewhere? I tried locally and -j0 doesn't seem to parallelize anything

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, not sure why I was sure this could be automatically parallelized to the number of cores. I think it was intended to have this implemented, see https://github.com/python/cpython/blob/master/Lib/distutils/command/build_ext.py#L453 but in practice there is no way to make parallel equal to True.

Given that, I'd use:

-j `nproc`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the parallel extensions flag not work? is -j the new thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parallel and j are the same thing for distutils:

https://github.com/python/cpython/blob/03c8e5d46673e4704c8dc955d52735966ffdc2a4/Lib/distutils/command/build_ext.py#L87

And have the same requirements for specifying an integer (the assignment to self.parallel as shown in Marc's link I think can be misleading).

Here is the GNU make documentation on that:

https://www.gnu.org/software/make/manual/html_node/Parallel.html#Parallel

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista not sure that works for macOS unless you know of an alternate syntax?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about mac, but -j $(nproc) is equivalent, and works when the shell is not bash I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so I don't think nproc is available by default in macOS, so reverting to 2 for now (though definitely a nice idea)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the OSX equivalent is sysctl -n hw.ncpu, but that's one of the namespaces that apple is liable to change every release

@WillAyd
Copy link
Member Author

WillAyd commented Dec 26, 2019

Overall build time tends to fluctuate but this looks to chop off 2-5 minutes on average from non-Windows

@jreback
Copy link
Contributor

jreback commented Dec 26, 2019

lgtm. thanks @WillAyd

@jreback jreback merged commit 75c37fe into pandas-dev:master Dec 26, 2019
@WillAyd WillAyd deleted the parallel-build branch December 26, 2019 18:50
@datapythonista
Copy link
Member

Nice improvement, thanks @WillAyd

xref #24028

AlexKirko pushed a commit to AlexKirko/pandas that referenced this pull request Dec 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants