Skip to content

Commit 6b2c724

Browse files
committed
Reestablish watch and retry wait for some errors
Armada uses a Kubernetes watch to implement its chart wait logic. This can be a fairly long-lived connection to the Kubernetes API server, and is vulnerable to disruption (if, for example, the kubernetes apiserver chart is being upgraded). This change allows Armada to retry the wait for some specific errors, including the establishment of a new watch, until the overall chart timeout is reached. kubernetes-client/python#972 urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read)) Change-Id: I3e68a54becadd5b2a2343960a120bdc3de8e8515
1 parent ae1281d commit 6b2c724

File tree

2 files changed

+13
-0
lines changed

2 files changed

+13
-0
lines changed

armada/handlers/wait.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121

2222
from kubernetes import watch
2323
from oslo_log import log as logging
24+
from retry import retry
25+
import urllib3.exceptions
2426

2527
from armada import const
2628
from armada.exceptions import k8s_exceptions
@@ -318,6 +320,16 @@ def wait(self, timeout):
318320
else:
319321
self._wait(deadline)
320322

323+
# The Kubernetes Python Client does not always recover from broken
324+
# connections to the k8s apiserver, and the resulting uncaught exceptions
325+
# in the Watch.stream method cause the chart installation to fail. As long
326+
# as the wait deadline has not passed, it is better to retry the entire
327+
# wait operation.
328+
@retry(
329+
exceptions=(
330+
urllib3.exceptions.ProtocolError,
331+
urllib3.exceptions.MaxRetryError),
332+
delay=1)
321333
def _wait(self, deadline):
322334
'''
323335
Waits for resources to become ready.

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ PasteDeploy>=1.5.2
1010
protobuf>=3.4.0
1111
PyYAML==3.12
1212
requests
13+
retry
1314
prometheus_client==0.7.0
1415

1516
# API

0 commit comments

Comments
 (0)