Skip to content

Commit eac1e20

Browse files
committed
Try to understand and fix test_shutdown_wait_last_export
TL;DR The time assert in the test was wrong. Looking at the original commit (c84ba9) of this test, the delay of the mocked backoff was 4 seconds fixed with 6 iterations. So the time until shutdown returned was around 24 seconds. The default timeout of the lock wait in shutdown did not hit before the retry of the exporter was over. The line `self.assertGreaterEqual(now, (start_time + 30 / 1000))` seemed to be wrong from the beginning. It just ensured that the shutdown call took at least 30ms. I assume it was supposed to do something else. But to me it's not clear what exactly it should do. One explantion would be that it was intended to do `self.assertGreaterEqual(now, (start_time + 30 * 1000))` instead. But that would mean: Shutdown should need at least 30 seconds to pass, which does not make sense imho. Another option would be that it was intended to do `self.assertLessEqual(now, (start_time + 30 * 1000))` instead. That would mean: Shutdown should need at maximum 30 seconds to pass, which actually could make more sense. It would ensure that the call of `self.exporter.shutdown` returns after 30 seconds. My biggest issue here is, that it tests the happy path where the retry is over before the timeout while there is no mechanism to ensure that the exporter thread is stopped in a sensible time in case it's current exponential backoff is in a longer sleep already while the shutdown flag is set. If I understand it correctly, it can hit the beginning of a 32 second sleep, which could mean that a process needs a) 30 seconds to signal the shutdown and additional 32 seconds to finish the exporter thread. But that's something for another change. After all these words, looking at the current HEAD, the situation of the test case is basically the same as originally. The retry interval has been changed recently to be way shorter. The default lock timeout stayed the same at 30 seconds, but that does not matter too much since the retry mechanism is finished long before that. And the aforementioned assert is still wrong, with some additional problem from open-telemetry#4014.
1 parent 6350763 commit eac1e20

File tree

3 files changed

+51
-15
lines changed

3 files changed

+51
-15
lines changed

exporter/opentelemetry-exporter-otlp-proto-grpc/tests/test_otlp_exporter_mixin.py

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
# limitations under the License.
1414

1515
import threading
16+
import time
1617
from logging import WARNING
1718
from time import time_ns
1819
from types import MethodType
@@ -156,7 +157,9 @@ def _exporting(self) -> str:
156157
"Exporter already shutdown, ignoring batch",
157158
)
158159

159-
def test_shutdown_wait_last_export(self):
160+
def test_shutdown_wait_for_last_export_finishing_within_shutdown_timeout(
161+
self,
162+
):
160163
result_mock = Mock()
161164
rpc_error = RpcError()
162165

@@ -200,15 +203,24 @@ def _exporting(self) -> str:
200203
export_thread.start()
201204
try:
202205
# pylint: disable=protected-access
206+
207+
# Wait for the export thread to hold the lock. Since the main thread is not synchronized
208+
# with the export thread, the thread may not be ready yet.
209+
for _ in range(5):
210+
if otlp_mock_exporter._exporter._export_lock.locked():
211+
break
212+
time.sleep(0.01)
203213
self.assertTrue(otlp_mock_exporter._exporter._export_lock.locked())
204-
# delay is 1 second while the default shutdown timeout is 30_000 milliseconds
214+
215+
# 6 retries with a fixed retry delay of 10ms (see TraceServiceServicerUNAVAILABLEDelay)
216+
# The default shutdown timeout is 30 seconds
217+
# This means the retries will finish long before the shutdown flag will be set
205218
start_time = time_ns()
206219
otlp_mock_exporter.shutdown()
207220
now = time_ns()
208-
self.assertGreaterEqual(now, (start_time + 30 / 1000))
209-
# pylint: disable=protected-access
221+
# Verify that the shutdown method finished within the shutdown timeout
222+
self.assertLessEqual(now, (start_time + 3e10))
210223
self.assertTrue(otlp_mock_exporter._shutdown)
211-
# pylint: disable=protected-access
212224
self.assertFalse(
213225
otlp_mock_exporter._exporter._export_lock.locked()
214226
)

exporter/opentelemetry-exporter-otlp-proto-grpc/tests/test_otlp_metrics_exporter.py

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
# pylint: disable=too-many-lines
1616

1717
import threading
18+
import time
1819
from concurrent.futures import ThreadPoolExecutor
1920

2021
# pylint: disable=too-many-lines
@@ -852,7 +853,9 @@ def test_shutdown(self):
852853
)
853854
self.exporter = OTLPMetricExporter()
854855

855-
def test_shutdown_wait_last_export(self):
856+
def test_shutdown_wait_for_last_export_finishing_within_shutdown_timeout(
857+
self,
858+
):
856859
add_MetricsServiceServicer_to_server(
857860
MetricsServiceServicerUNAVAILABLEDelay(), self.server
858861
)
@@ -863,15 +866,24 @@ def test_shutdown_wait_last_export(self):
863866
export_thread.start()
864867
try:
865868
# pylint: disable=protected-access
869+
870+
# Wait for the export thread to hold the lock. Since the main thread is not synchronized
871+
# with the export thread, the thread may not be ready yet.
872+
for _ in range(5):
873+
if self.exporter._exporter._export_lock.locked():
874+
break
875+
time.sleep(0.01)
866876
self.assertTrue(self.exporter._exporter._export_lock.locked())
867-
# delay is 4 seconds while the default shutdown timeout is 30_000 milliseconds
877+
878+
# 6 retries with a fixed retry delay of 10ms (see TraceServiceServicerUNAVAILABLEDelay)
879+
# The default shutdown timeout is 30 seconds
880+
# This means the retries will finish long before the shutdown flag will be set
868881
start_time = time_ns()
869882
self.exporter.shutdown()
870883
now = time_ns()
871-
self.assertGreaterEqual(now, (start_time + 30 / 1000))
872-
# pylint: disable=protected-access
884+
# Verify that the shutdown method finished within the shutdown timeout
885+
self.assertLessEqual(now, (start_time + 3e10))
873886
self.assertTrue(self.exporter._shutdown)
874-
# pylint: disable=protected-access
875887
self.assertFalse(self.exporter._exporter._export_lock.locked())
876888
finally:
877889
export_thread.join()

exporter/opentelemetry-exporter-otlp-proto-grpc/tests/test_otlp_trace_exporter.py

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616

1717
import os
1818
import threading
19+
import time
1920
from concurrent.futures import ThreadPoolExecutor
2021
from logging import WARNING
2122
from time import time_ns
@@ -988,7 +989,9 @@ def test_shutdown(self):
988989
"Exporter already shutdown, ignoring batch",
989990
)
990991

991-
def test_shutdown_wait_last_export(self):
992+
def test_shutdown_wait_for_last_export_finishing_within_shutdown_timeout(
993+
self,
994+
):
992995
add_TraceServiceServicer_to_server(
993996
TraceServiceServicerUNAVAILABLEDelay(), self.server
994997
)
@@ -999,15 +1002,24 @@ def test_shutdown_wait_last_export(self):
9991002
export_thread.start()
10001003
try:
10011004
# pylint: disable=protected-access
1005+
1006+
# Wait for the export thread to hold the lock. Since the main thread is not synchronized
1007+
# with the export thread, the thread may not be ready yet.
1008+
for _ in range(5):
1009+
if self.exporter._exporter._export_lock.locked():
1010+
break
1011+
time.sleep(0.01)
10021012
self.assertTrue(self.exporter._exporter._export_lock.locked())
1003-
# delay is 4 seconds while the default shutdown timeout is 30_000 milliseconds
1013+
1014+
# 6 retries with a fixed retry delay of 10ms (see TraceServiceServicerUNAVAILABLEDelay)
1015+
# The default shutdown timeout is 30 seconds
1016+
# This means the retries will finish long before the shutdown flag will be set
10041017
start_time = time_ns()
10051018
self.exporter.shutdown()
10061019
now = time_ns()
1007-
self.assertGreaterEqual(now, (start_time + 30 / 1000))
1008-
# pylint: disable=protected-access
1020+
# Verify that the shutdown method finished within the shutdown timeout
1021+
self.assertLessEqual(now, (start_time + 3e10))
10091022
self.assertTrue(self.exporter._shutdown)
1010-
# pylint: disable=protected-access
10111023
self.assertFalse(self.exporter._exporter._export_lock.locked())
10121024
finally:
10131025
export_thread.join()

0 commit comments

Comments
 (0)