Skip to content

Commit 80130d5

Browse files
amezinnicoddemus
andauthored
Restore old initial batch distribution logic in LoadScheduling (#812)
* Restore old initial batch distribution logic in LoadScheduling pytest orders tests for optimal sequential execution - i. e. avoiding unnecessary setup and teardown of fixtures. So executing tests in consecutive chunks is important for optimal performance. Commit 09d79ac optimized test distribution for the corner case, when the number of tests is less than 2 * number of nodes. At the same time, it made initial test distribution worse for all other cases. If some tests use some fixture, and these tests fit into the initial batch, the fixture will be created min(n_tests, n_workers) times, no matter how many other tests there are. With the old algorithm (before 09d79ac), if there are enough tests not using the fixture, the fixture was created only once. So restore the old behavior for typical cases where the number of tests is much greater than the number of workers (or, strictly speaking, when there are at least 2 tests for every node). In my test suite, where fixtures create Docker containers, this change reduces total run time by 10-15%. This is a partial revert of commit 09d79ac Co-authored-by: Bruno Oliveira <[email protected]>
1 parent a1785b5 commit 80130d5

File tree

3 files changed

+49
-13
lines changed

3 files changed

+49
-13
lines changed

changelog/812.feature

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
Partially restore old initial batch distribution algorithm in ``LoadScheduling``.
2+
3+
pytest orders tests for optimal sequential execution - i. e. avoiding
4+
unnecessary setup and teardown of fixtures. So executing tests in consecutive
5+
chunks is important for optimal performance.
6+
7+
In v1.14, initial test distribution in ``LoadScheduling`` was changed to
8+
round-robin, optimized for the corner case, when the number of tests is less
9+
than ``2 * number of nodes``. At the same time, it became worse for all other
10+
cases.
11+
12+
For example: if some tests use some "heavy" fixture, and these tests fit into
13+
the initial batch, with round-robin distribution the fixture will be created
14+
``min(n_tests, n_workers)`` times, no matter how many other tests there are.
15+
16+
With the old algorithm (before v1.14), if there are enough tests not using
17+
the fixture, the fixture was created only once.
18+
19+
So restore the old behavior for typical cases where the number of tests is
20+
much greater than the number of workers (or, strictly speaking, when there
21+
are at least 2 tests for every node).
22+

src/xdist/scheduler/load.py

+21-7
Original file line numberDiff line numberDiff line change
@@ -248,13 +248,27 @@ def schedule(self):
248248
# Send a batch of tests to run. If we don't have at least two
249249
# tests per node, we have to send them all so that we can send
250250
# shutdown signals and get all nodes working.
251-
initial_batch = max(len(self.pending) // 4, 2 * len(self.nodes))
252-
253-
# distribute tests round-robin up to the batch size
254-
# (or until we run out)
255-
nodes = cycle(self.nodes)
256-
for i in range(initial_batch):
257-
self._send_tests(next(nodes), 1)
251+
if len(self.pending) < 2 * len(self.nodes):
252+
# Distribute tests round-robin. Try to load all nodes if there are
253+
# enough tests. The other branch tries sends at least 2 tests
254+
# to each node - which is suboptimal when you have less than
255+
# 2 * len(nodes) tests.
256+
nodes = cycle(self.nodes)
257+
for i in range(len(self.pending)):
258+
self._send_tests(next(nodes), 1)
259+
else:
260+
# Send batches of consecutive tests. By default, pytest sorts tests
261+
# in order for optimal single-threaded execution, minimizing the
262+
# number of necessary fixture setup/teardown. Try to keep that
263+
# optimal order for every worker.
264+
265+
# how many items per node do we have about?
266+
items_per_node = len(self.collection) // len(self.node2pending)
267+
# take a fraction of tests for initial distribution
268+
node_chunksize = max(items_per_node // 4, 2)
269+
# and initialize each node with a chunk of tests
270+
for node in self.nodes:
271+
self._send_tests(node, node_chunksize)
258272

259273
if not self.pending:
260274
# initial distribution sent all tests, start node shutdown

testing/test_dsession.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -115,18 +115,18 @@ def test_schedule_batch_size(self, pytester: pytest.Pytester) -> None:
115115
# assert not sched.tests_finished
116116
sent1 = node1.sent
117117
sent2 = node2.sent
118-
assert sent1 == [0, 2]
119-
assert sent2 == [1, 3]
118+
assert sent1 == [0, 1]
119+
assert sent2 == [2, 3]
120120
assert sched.pending == [4, 5]
121121
assert sched.node2pending[node1] == sent1
122122
assert sched.node2pending[node2] == sent2
123123
assert len(sched.pending) == 2
124124
sched.mark_test_complete(node1, 0)
125-
assert node1.sent == [0, 2, 4]
125+
assert node1.sent == [0, 1, 4]
126126
assert sched.pending == [5]
127-
assert node2.sent == [1, 3]
128-
sched.mark_test_complete(node1, 2)
129-
assert node1.sent == [0, 2, 4, 5]
127+
assert node2.sent == [2, 3]
128+
sched.mark_test_complete(node1, 1)
129+
assert node1.sent == [0, 1, 4, 5]
130130
assert not sched.pending
131131

132132
def test_schedule_fewer_tests_than_nodes(self, pytester: pytest.Pytester) -> None:

0 commit comments

Comments
 (0)