Skip to content

Commit 584e273

Browse files
committed
Kill app server after fail of a non-default server
See a test case in #252. The idea is the following: the default server (it is the app server) executes a script, which starts a non-default server and hangs. The non-default instance fails after some time (before --test-timeout), test-run's crash detector observed this and stops the test (kills the greenlet). We kill all other non-default servers in the case, but prior to this commit, we don't kill the app server itself. Regarding the implementation. I updated AppServer.stop() to follow the way how TarantoolServer.stop() works: verbose logging, use SIGKILL after a timeout, wait for a process termination. And reused this code in the finalization of an app test. Introduced the AppTest.teardown() method. The idea is to have a method, where all finalization after AppTest.execute() occurs. Maybe we'll refactor process management in future and presence an explicit finalization method will make things more clear. I think, it is good point to close #65 and #157: most of cases, where tarantool instances are not terminated or killed, are fixed. There is kill_old_server(), which only sends SIGTERM, but we should not find alive instances here after the 'Wait until residual servers will be stopped' commit. There is #256, but it unlikely will hit us on real tests. If there will be other cases of this kind, we should handle them separately: file issue with a reproducer, investigate and fix them. I see no reason to track 'kill hung server' as the general task anymore. Fixes #65 Fixes #157
1 parent d562031 commit 584e273

File tree

1 file changed

+66
-10
lines changed

1 file changed

+66
-10
lines changed

lib/app_server.py

+66-10
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
from lib.colorer import color_stdout
1111
from lib.colorer import color_log
12+
from lib.colorer import qa_notice
1213
from lib.options import Options
1314
from lib.preprocessor import TestState
1415
from lib.server import Server
@@ -18,6 +19,7 @@
1819
from lib.tarantool_server import TarantoolStartError
1920
from lib.utils import find_port
2021
from lib.utils import format_process
22+
from lib.utils import signame
2123
from lib.utils import warn_unix_socket
2224
from test import TestRunGreenlet, TestExecutionError
2325
from threading import Timer
@@ -72,15 +74,25 @@ def execute(self, server):
7274
# A non-default server failed to start.
7375
raise TestExecutionError
7476
finally:
75-
# Stop any servers created by the test, except the
76-
# default one.
77-
#
78-
# See a comment in LuaTest.execute() for motivation of
79-
# SIGKILL usage.
80-
ts.stop_nondefault(signal=signal.SIGKILL)
77+
self.teardown(server, ts)
8178
if retval.get('returncode', None) != 0:
8279
raise TestExecutionError
8380

81+
def teardown(self, server, ts):
82+
# Stop any servers created by the test, except the
83+
# default one.
84+
#
85+
# See a comment in LuaTest.execute() for motivation of
86+
# SIGKILL usage.
87+
ts.stop_nondefault(signal=signal.SIGKILL)
88+
89+
# When a supplementary (non-default) server fails, we
90+
# should not leave the process that executes an app test.
91+
# Let's kill it.
92+
#
93+
# Reuse AppServer.stop() code for convenience.
94+
server.stop(signal=signal.SIGKILL)
95+
8496

8597
class AppServer(Server):
8698
"""A dummy server implementation for application server tests"""
@@ -153,16 +165,60 @@ def deploy(self, vardir=None, silent=True, need_init=True):
153165
# cannot check length of path of *.control unix socket created by it.
154166
# So for 'app' tests type we don't check *.control unix sockets paths.
155167

156-
def stop(self, silent):
168+
def stop(self, silent=True, signal=signal.SIGTERM):
169+
# FIXME: Extract common parts of AppServer.stop() and
170+
# TarantoolServer.stop() to an utility function.
171+
172+
color_log('DEBUG: [app server] Stopping the server...\n',
173+
schema='info')
174+
157175
if not self.process:
176+
color_log(' | Nothing to do: the process does not exist\n',
177+
schema='info')
158178
return
159-
color_log('AppServer.stop(): stopping the %s\n' %
160-
format_process(self.process.pid), schema='test_var')
179+
180+
if self.process.returncode:
181+
if self.process.returncode < 0:
182+
signaled_by = -self.process.returncode
183+
color_log(' | Nothing to do: the process was terminated by '
184+
'signal {} ({})\n'.format(signaled_by,
185+
signame(signaled_by)),
186+
schema='info')
187+
else:
188+
color_log(' | Nothing to do: the process was exited with code '
189+
'{}\n'.format(self.process.returncode),
190+
schema='info')
191+
return
192+
193+
color_log(' | Sending signal {0} ({1}) to {2}\n'.format(
194+
signal, signame(signal),
195+
format_process(self.process.pid)))
161196
try:
162-
self.process.terminate()
197+
self.process.send_signal(signal)
163198
except OSError:
164199
pass
165200

201+
# Waiting for stopping the server. If the timeout
202+
# reached, send SIGKILL.
203+
timeout = 5
204+
205+
def kill():
206+
qa_notice('The app server does not stop during {} '
207+
'seconds after the {} ({}) signal.\n'
208+
'Info: {}\n'
209+
'Sending SIGKILL...'.format(
210+
timeout, signal, signame(signal),
211+
format_process(self.process.pid)))
212+
try:
213+
self.process.kill()
214+
except OSError:
215+
pass
216+
217+
timer = Timer(timeout, kill)
218+
timer.start()
219+
self.process.wait()
220+
timer.cancel()
221+
166222
@classmethod
167223
def find_exe(cls, builddir):
168224
cls.builddir = builddir

0 commit comments

Comments
 (0)