Skip to content

Sometime dependencies installation failed #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ligurio opened this issue May 25, 2021 · 3 comments · Fixed by tarantool/tarantool#6581
Closed

Sometime dependencies installation failed #87

ligurio opened this issue May 25, 2021 · 3 comments · Fixed by tarantool/tarantool#6581
Assignees

Comments

@ligurio
Copy link
Member

ligurio commented May 25, 2021

INFO [2021-05-25 03:22:17,945] jepsen node 37.139.40.66 - jepsen.os.ubuntu 37.139.40.66 setting up ubuntu
INFO [2021-05-25 03:22:20,565] jepsen node 37.139.40.66 - jepsen.os.debian Installing #{apt-transport-https libzip4 ntpdate faketime}
WARN [2021-05-25 03:22:21,031] main - jepsen.core Test crashed!
clojure.lang.ExceptionInfo: Command exited with non-zero status 100 on node 37.139.40.66:
sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime"

STDIN:
null

STDOUT:
Reading package lists...
Building dependency tree...
Reading state information...


STDERR:
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Unable to locate package libzip4
E: Unable to locate package ntpdate
E: Unable to locate package faketime

	at slingshot.support$stack_trace.invoke(support.clj:201) ~[knossos-0.3.6.jar:na]
	at jepsen.control$throw_on_nonzero_exit.invokeStatic(control.clj:150) ~[jepsen-0.1.18.jar:na]
	at jepsen.control$throw_on_nonzero_exit.invoke(control.clj:145) ~[jepsen-0.1.18.jar:na]
	at jepsen.control$exec_STAR_.invokeStatic(control.clj:194) ~[jepsen-0.1.18.jar:na]
	at jepsen.control$exec_STAR_.doInvoke(control.clj:191) ~[jepsen-0.1.18.jar:na]
	at clojure.lang.RestFn.applyTo(RestFn.java:137) ~[clojure-1.10.1.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
	at clojure.core$apply.invoke(core.clj:660) ~[clojure-1.10.1.jar:na]
	at jepsen.control$exec.invokeStatic(control.clj:210) ~[jepsen-0.1.18.jar:na]
	at jepsen.control$exec.doInvoke(control.clj:204) ~[jepsen-0.1.18.jar:na]
	at clojure.lang.RestFn.applyTo(RestFn.java:137) ~[clojure-1.10.1.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:673) ~[clojure-1.10.1.jar:na]
	at clojure.core$apply.doInvoke(core.clj:660) ~[clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.invoke(RestFn.java:660) ~[clojure-1.10.1.jar:na]
	at jepsen.os.debian$install.invokeStatic(debian.clj:100) ~[na:na]
	at jepsen.os.debian$install.invoke(debian.clj:80) ~[na:na]
	at jepsen.os.ubuntu.Ubuntu$fn__1990.invoke(ubuntu.clj:24) ~[na:na]
	at jepsen.os.ubuntu.Ubuntu.setup_BANG_(ubuntu.clj:22) ~[na:na]
	at jepsen.os$fn__2322$G__2316__2326.invoke(os.clj:4) ~[jepsen-0.1.18.jar:na]
	at jepsen.os$fn__2322$G__2315__2331.invoke(os.clj:4) ~[jepsen-0.1.18.jar:na]
	at clojure.core$partial$fn__5839.invoke(core.clj:2625) ~[clojure-1.10.1.jar:na]
	at jepsen.control$on_nodes$fn__3036.invoke(control.clj:450) ~[jepsen-0.1.18.jar:na]
	at clojure.lang.AFn.applyToHelper(AFn.java:154) ~[clojure-1.10.1.jar:na]
	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.10.1.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) ~[clojure-1.10.1.jar:na]
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) ~[clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.applyTo(RestFn.java:142) ~[clojure-1.10.1.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:669) ~[clojure-1.10.1.jar:na]
	at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) ~[clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.invoke(RestFn.java:408) ~[clojure-1.10.1.jar:na]
	at dom_top.core$real_pmap_helper$build_thread__214$fn__215.invoke(core.clj:146) ~[jepsen-0.1.18.jar:na]
	at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.10.1.jar:na]
	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.10.1.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) ~[clojure-1.10.1.jar:na]
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) ~[clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.10.1.jar:na]
	at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.10.1.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:669) ~[clojure-1.10.1.jar:na]
	at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) ~[clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.10.1.jar:na]
	at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.10.1.jar:na]
	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_292]
@ligurio ligurio self-assigned this May 25, 2021
@ligurio
Copy link
Member Author

ligurio commented May 25, 2021

There are known concurrency problems in Jepsen library:

jepsen.control is plagued by what I think are race conditions in Jsch, which I've never had the chance to dig in and fix.
source

and I believe these problems will gone after upgrading to the latest versions of Jepsen, see #30.

ligurio added a commit to tarantool/tarantool that referenced this issue May 31, 2021
Sometimes jobs on CI with Jepsen tests failed on installation
depenencies:

```
sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime"

STDIN:
null

STDOUT:
Reading package lists...
Building dependency tree...
Reading state information...

STDERR:
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Unable to locate package libzip4
E: Unable to locate package ntpdate
E: Unable to locate package faketime
```

Problem looks as a flaky, I couldn't reproduce it locally.  I suspect
the root cause is an infrastructure problem and to get more details
about it I have enabled debug options in apt-get and added
`set -o errexit` as it is recommended in documentation [1]
(see Note section).

1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference

Part of: tarantool/jepsen.tarantool#87
ligurio added a commit to tarantool/tarantool that referenced this issue May 31, 2021
Sometimes jobs on CI with Jepsen tests failed on installation
depenencies:

```
sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime"

STDIN:
null

STDOUT:
Reading package lists...
Building dependency tree...
Reading state information...

STDERR:
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Unable to locate package libzip4
E: Unable to locate package ntpdate
E: Unable to locate package faketime
```

Problem looks as a flaky, I couldn't reproduce it locally.  I suspect
the root cause is an infrastructure problem and to get more details
about it I have enabled debug options in apt-get and added
`set -o errexit` as it is recommended in documentation [1]
(see Note section).

1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference

Part of: tarantool/jepsen.tarantool#87
ligurio added a commit to tarantool/tarantool that referenced this issue May 31, 2021
Sometimes jobs on CI with Jepsen tests failed on installation
dependencies:

```
sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime"

STDIN:
null

STDOUT:
Reading package lists...
Building dependency tree...
Reading state information...

STDERR:
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Unable to locate package libzip4
E: Unable to locate package ntpdate
E: Unable to locate package faketime
```

Problem looks as a flaky, I couldn't reproduce it locally.  I suspect
the root cause is an infrastructure problem and to get more details
about it I have enabled debug options in apt-get and added
`set -o errexit` as it is recommended in documentation [1]
(see Note section).

1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference

Part of: tarantool/jepsen.tarantool#87
kyukhin pushed a commit to tarantool/tarantool that referenced this issue Jun 2, 2021
Sometimes jobs on CI with Jepsen tests failed on installation
dependencies:

```
sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime"

STDIN:
null

STDOUT:
Reading package lists...
Building dependency tree...
Reading state information...

STDERR:
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Unable to locate package libzip4
E: Unable to locate package ntpdate
E: Unable to locate package faketime
```

Problem looks as a flaky, I couldn't reproduce it locally.  I suspect
the root cause is an infrastructure problem and to get more details
about it I have enabled debug options in apt-get and added
`set -o errexit` as it is recommended in documentation [1]
(see Note section).

1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference

Part of: tarantool/jepsen.tarantool#87

(cherry picked from commit f40afb8)
kyukhin pushed a commit to tarantool/tarantool that referenced this issue Jun 2, 2021
Sometimes jobs on CI with Jepsen tests failed on installation
dependencies:

```
sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime"

STDIN:
null

STDOUT:
Reading package lists...
Building dependency tree...
Reading state information...

STDERR:
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Unable to locate package libzip4
E: Unable to locate package ntpdate
E: Unable to locate package faketime
```

Problem looks as a flaky, I couldn't reproduce it locally.  I suspect
the root cause is an infrastructure problem and to get more details
about it I have enabled debug options in apt-get and added
`set -o errexit` as it is recommended in documentation [1]
(see Note section).

1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference

Part of: tarantool/jepsen.tarantool#87

(cherry picked from commit f40afb8)
tsafin pushed a commit to tsafin/tarantool that referenced this issue Jun 7, 2021
Sometimes jobs on CI with Jepsen tests failed on installation
dependencies:

```
sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime"

STDIN:
null

STDOUT:
Reading package lists...
Building dependency tree...
Reading state information...

STDERR:
W: --force-yes is deprecated, use one of the options starting with --allow instead.
E: Unable to locate package libzip4
E: Unable to locate package ntpdate
E: Unable to locate package faketime
```

Problem looks as a flaky, I couldn't reproduce it locally.  I suspect
the root cause is an infrastructure problem and to get more details
about it I have enabled debug options in apt-get and added
`set -o errexit` as it is recommended in documentation [1]
(see Note section).

1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference

Part of: tarantool/jepsen.tarantool#87
@Totktonada
Copy link
Member

Forewords

I found another problem, but, it seems, it's cause is the cause of this one as well. So I'll introduce the terms:

  • Problem A: the problem described in the issue: Jepsen is unable to install necessary packages.
  • Problem B: the problem I described below: apt-get <...> update invoked by Terraform fails.

What I also want to say: I know really nothing about terraform, packer and cloud-init, so mistakes are possible: I'm just trying to understand what is going on based on what I see. My wording may be incorrect.

Problem B

I found this 'problem B' during testing of PR #93. It is about Ubuntu repositories as well, but symptoms are different:

openstack_compute_instance_v2.instance[0] (remote-exec): E: Type 'to' is not known on line 50 in source list /etc/apt/sources.list
openstack_compute_instance_v2.instance[0] (remote-exec): E: The list of sources could not be read.

It appears on apt-get <...> update before Jepsen starts.

Full logs and other details are below.

Experiments

I injected the following code to debug the problem B (the patch is applied on the tarantool repository):

diff --git a/extra/tf/main.tf b/extra/tf/main.tf
index abe8e606d..578968ac2 100644
--- a/extra/tf/main.tf
+++ b/extra/tf/main.tf
@@ -29,6 +29,7 @@ resource "openstack_compute_instance_v2" "instance" {
     inline = [
       "set -o errexit",
       "sudo hostnamectl set-hostname n${count.index + 1}",
+      "cat -n /etc/apt/sources.list",
       "sudo apt-get -o Debug::Acquire::http=true -o Debug::pkgAcquire::Worker=1 update"
     ]
   }

And run testing several times. Once during the those runs I meet the problem A and found the following difference in /etc/apt/sources.list between successul and failed run:

$ diff -u <(sed -e 's/^ \?[0-9]\+\t\?//' success-sources-list.txt) <(sed -e 's/^ \?[0-9]\+\t\?//' failure-sources-list.txt)
--- /dev/fd/63	2021-10-31 02:34:07.585041600 +0300
+++ /dev/fd/62	2021-10-31 02:34:07.585041600 +0300
@@ -1,46 +1,38 @@
-## Note, this file is written by cloud-init on first boot of an instance
-## modifications made here will not survive a re-bundle.
-
-## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
-##     or do the same in user-data
-## b.) add sources in /etc/apt/sources.list.d
-## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
-
 # See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
 # newer versions of the distribution.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic main restricted
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic main restricted
+deb http://archive.ubuntu.com/ubuntu/ bionic main restricted
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic main restricted
 
 ## Major bug fix updates produced after the final release of the
 ## distribution.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates main restricted
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates main restricted
+deb http://archive.ubuntu.com/ubuntu/ bionic-updates main restricted
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-updates main restricted
 
 ## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
 ## team. Also, please note that software in universe WILL NOT receive any
 ## review or updates from the Ubuntu security team.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic universe
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic universe
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates universe
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates universe
+deb http://archive.ubuntu.com/ubuntu/ bionic universe
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic universe
+deb http://archive.ubuntu.com/ubuntu/ bionic-updates universe
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-updates universe
 
 ## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
 ## team, and may not be under a free licence. Please satisfy yourself as to
 ## your rights to use the software. Also, please note that software in
 ## multiverse WILL NOT receive any review or updates from the Ubuntu
 ## security team.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic multiverse
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic multiverse
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates multiverse
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic-updates multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-updates multiverse
 
 ## N.B. software from this repository may not have been tested as
 ## extensively as that contained in the main release, although it includes
 ## newer versions of some applications which may provide useful features.
 ## Also, please note that software in backports WILL NOT receive any review
 ## or updates from the Ubuntu security team.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
 
 ## Uncomment the following two lines to add software from Canonical's
 ## 'partner' repository.
@@ -49,9 +41,9 @@
 # deb http://archive.canonical.com/ubuntu bionic partner
 # deb-src http://archive.canonical.com/ubuntu bionic partner
 
-deb http://security.ubuntu.com/ubuntu bionic-security main restricted
-# deb-src http://security.ubuntu.com/ubuntu bionic-security main restricted
-deb http://security.ubuntu.com/ubuntu bionic-security universe
-# deb-src http://security.ubuntu.com/ubuntu bionic-security universe
-deb http://security.ubuntu.com/ubuntu bionic-security multiverse
-# deb-src http://security.ubuntu.com/ubuntu bionic-security multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic-security main restricted
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-security main restricted
+deb http://archive.ubuntu.com/ubuntu/ bionic-security universe
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-security universe
+deb http://archive.ubuntu.com/ubuntu/ bionic-security multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-security multiverse

I guess that problem B appears due to some transient state of the /etc/apt/sources.list, so we'll fix both at once.

The raw output of the can -n /etc/apt/sources.list command in both cases:

Full logs in both cases:

Interpretation

As we can see from the /etc/apt/sources.list file content, cloud-init writes the file before apt-get <...> update (and so before Jepsen will install dependencies) in the successful case. There is the nice description of successful and failed cases here, I'll not repeat it. So the solution is simple: wait until cloud-init actions will be finished.

There are several ways to do so, they're spread across the following threads:

As I see from this comment, there are ways to detect that cloud-init is initialized (it occurs only once) and that it is started. I guess that, since we deploy the instance from scratch each time and don't save any state between runs, any way should be okay for us.

I like this solution: just call cloud-init status --wait. It looks direct and simple. I don't know, whether we can see lack of the cloud-init executable, but hopefully it is preinstalled into the image we use (rather than installed by terraform).

Solution

I would try the following:

diff --git a/extra/tf/main.tf b/extra/tf/main.tf
index abe8e606d..1230efefc 100644
--- a/extra/tf/main.tf
+++ b/extra/tf/main.tf
@@ -28,6 +28,7 @@ resource "openstack_compute_instance_v2" "instance" {
   provisioner "remote-exec" {
     inline = [
       "set -o errexit",
+      "sudo cloud-init status --wait",
       "sudo hostnamectl set-hostname n${count.index + 1}",
       "sudo apt-get -o Debug::Acquire::http=true -o Debug::pkgAcquire::Worker=1 update"
     ]

I'll run it several times in CI and if I'll not see neither problem A, nor problem B anymore, I'll propose it in a pull request. Otherwise I'll write the new results here.

@Totktonada
Copy link
Member

Totktonada commented Nov 1, 2021

I made 200 runs (100 runs of the 'jepsen-single-instance' workflow and 100 runs of the 'jepsen-single-instance-txm' workflow). The statistics is the following.

5 fails (2.5% of total runs) and nothing with symptoms as above. Looks as success!

See details below.


2 crashes on bank-lua with InterruptedException on the bank-lua test. On the first glance, it looks as a problem in the tarantool-java connector.

Details
2021-10-31 03:05:09,019{GMT}	WARN	[jepsen worker 0] jepsen.core: Process 0 crashed
java.sql.SQLException: Couldn't initiate connection using user=*****, port=3301, password=*****, host=185.241.194.12, loginTimeout=10000, connectTimeout=10000, socketTimeout=10000
	at org.tarantool.jdbc.SQLConnection.<init>(SQLConnection.java:73) ~[connector-1.9.4.jar:na]
	at org.tarantool.jdbc.SQLDriver.connect(SQLDriver.java:44) ~[connector-1.9.4.jar:na]
	at java.sql.DriverManager.getConnection(DriverManager.java:664) ~[na:1.8.0_302]
	at java.sql.DriverManager.getConnection(DriverManager.java:208) ~[na:1.8.0_302]
	at next.jdbc.connection$get_driver_connection.invokeStatic(connection.clj:139) ~[na:na]
	at next.jdbc.connection$get_driver_connection.invoke(connection.clj:134) ~[na:na]
	at next.jdbc.connection$url_PLUS_etc$reify__847.getConnection(connection.clj:357) ~[na:na]
	at next.jdbc.connection$make_connection.invokeStatic(connection.clj:383) ~[na:na]
	at next.jdbc.connection$make_connection.invoke(connection.clj:367) ~[na:na]
	at next.jdbc.connection$eval866$fn__867.invoke(connection.clj:406) ~[na:na]
	at next.jdbc.protocols$eval623$fn__624$G__614__631.invoke(protocols.clj:24) ~[na:na]
	at next.jdbc.result_set$eval1570$fn__1578.invoke(result_set.clj:874) ~[na:na]
	at next.jdbc.protocols$eval655$fn__686$G__646__695.invoke(protocols.clj:33) ~[na:na]
	at next.jdbc$execute_BANG_.invokeStatic(jdbc.clj:240) ~[na:na]
	at next.jdbc$execute_BANG_.invoke(jdbc.clj:224) ~[na:na]
	at next.jdbc.sql$query.invokeStatic(sql.clj:76) ~[na:na]
	at next.jdbc.sql$query.invoke(sql.clj:67) ~[na:na]
	at next.jdbc.sql$query.invokeStatic(sql.clj:73) ~[na:na]
	at next.jdbc.sql$query.invoke(sql.clj:67) ~[na:na]
	at tarantool.bank.BankClientWithLua.invoke_BANG_(bank.clj:49) ~[na:na]
	at jepsen.core$invoke_op_BANG_$fn__5881.invoke(core.clj:223) ~[jepsen-0.1.18.jar:na]
	at jepsen.core$invoke_op_BANG_.invokeStatic(core.clj:223) [jepsen-0.1.18.jar:na]
	at jepsen.core$invoke_op_BANG_.invoke(core.clj:218) [jepsen-0.1.18.jar:na]
	at jepsen.core.ClientWorker.run_worker_BANG_(core.clj:355) [jepsen-0.1.18.jar:na]
	at jepsen.core$run_workers_BANG_$run__5872.invoke(core.clj:206) [jepsen-0.1.18.jar:na]
	at dom_top.core$real_pmap_helper$build_thread__214$fn__215.invoke(core.clj:146) [jepsen-0.1.18.jar:na]
	at clojure.lang.AFn.applyToHelper(AFn.java:152) [clojure-1.10.1.jar:na]
	at clojure.lang.AFn.applyTo(AFn.java:144) [clojure-1.10.1.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:665) [clojure-1.10.1.jar:na]
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) [clojure-1.10.1.jar:na]
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) [clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.invoke(RestFn.java:425) [clojure-1.10.1.jar:na]
	at clojure.lang.AFn.applyToHelper(AFn.java:156) [clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.applyTo(RestFn.java:132) [clojure-1.10.1.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:669) [clojure-1.10.1.jar:na]
	at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) [clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.invoke(RestFn.java:397) [clojure-1.10.1.jar:na]
	at clojure.lang.AFn.run(AFn.java:22) [clojure-1.10.1.jar:na]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_302]
Caused by: java.lang.IllegalStateException: java.lang.InterruptedException
	at org.tarantool.TarantoolClientImpl.startConnector(TarantoolClientImpl.java:165) ~[connector-1.9.4.jar:na]
	at org.tarantool.TarantoolClientImpl.<init>(TarantoolClientImpl.java:121) ~[connector-1.9.4.jar:na]
	at org.tarantool.TarantoolClientImpl.<init>(TarantoolClientImpl.java:111) ~[connector-1.9.4.jar:na]
	at org.tarantool.jdbc.SQLConnection$SQLTarantoolClientImpl.<init>(SQLConnection.java:801) ~[connector-1.9.4.jar:na]
	at org.tarantool.jdbc.SQLConnection.makeSqlClient(SQLConnection.java:78) ~[connector-1.9.4.jar:na]
	at org.tarantool.jdbc.SQLConnection.<init>(SQLConnection.java:71) ~[connector-1.9.4.jar:na]
	... 38 common frames omitted
Caused by: java.lang.InterruptedException: null
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039) ~[na:1.8.0_302]
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) ~[na:1.8.0_302]
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) ~[na:1.8.0_302]
	at org.tarantool.TarantoolClientImpl$StateHelper.awaitState(TarantoolClientImpl.java:1002) ~[connector-1.9.4.jar:na]
	at org.tarantool.TarantoolClientImpl.waitAlive(TarantoolClientImpl.java:739) ~[connector-1.9.4.jar:na]
	at org.tarantool.TarantoolClientImpl.startConnector(TarantoolClientImpl.java:153) ~[connector-1.9.4.jar:na]
	... 43 common frames omitted
2021-10-31 03:05:09,025{GMT}	INFO	[jepsen worker 0] jepsen.util: 0	:info	:read	nil	indeterminate: java.lang.InterruptedException
2021-10-31 03:05:09,043{GMT}	INFO	[jepsen nemesis] jepsen.core: Tearing down nemesis
2021-10-31 03:05:09,068{GMT}	INFO	[jepsen worker 0] jepsen.core: Tearing down worker 0
2021-10-31 03:05:09,069{GMT}	INFO	[jepsen test runner] jepsen.core: Snarfing log files
2021-10-31 03:05:09,423{GMT}	INFO	[jepsen node 185.241.194.12] jepsen.core: downloading /var/log/tarantool/jepsen.log to jepsen.log
2021-10-31 03:05:09,952{GMT}	INFO	[jepsen node 185.241.194.12] tarantool.db: 185.241.194.12 Stopping Tarantool
2021-10-31 03:05:11,477{GMT}	WARN	[main] jepsen.core: Test crashed!
java.lang.InterruptedException: sleep interrupted
	at java.lang.Thread.sleep(Native Method) ~[na:1.8.0_302]
	at jepsen.generator.DelayFn.op(generator.clj:179) ~[jepsen-0.1.18.jar:na]
	at jepsen.generator.Mix.op(generator.clj:378) ~[jepsen-0.1.18.jar:na]
	at jepsen.generator.On.op(generator.clj:596) ~[jepsen-0.1.18.jar:na]
	at jepsen.generator.Concat.op(generator.clj:660) ~[jepsen-0.1.18.jar:na]
	at jepsen.generator.TimeLimit.op(generator.clj:546) ~[jepsen-0.1.18.jar:na]
	at jepsen.generator$op_and_validate.invokeStatic(generator.clj:34) ~[jepsen-0.1.18.jar:na]
	at jepsen.generator$op_and_validate.invoke(generator.clj:30) ~[jepsen-0.1.18.jar:na]
	at jepsen.core.NemesisWorker.run_worker_BANG_(core.clj:406) ~[jepsen-0.1.18.jar:na]
	at jepsen.core$run_workers_BANG_$run__5872.invoke(core.clj:206) ~[jepsen-0.1.18.jar:na]
	at dom_top.core$real_pmap_helper$build_thread__214$fn__215.invoke(core.clj:146) ~[jepsen-0.1.18.jar:na]
	at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.10.1.jar:na]
	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.10.1.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) ~[clojure-1.10.1.jar:na]
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) ~[clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.10.1.jar:na]
	at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.10.1.jar:na]
	at clojure.core$apply.invokeStatic(core.clj:669) ~[clojure-1.10.1.jar:na]
	at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) ~[clojure-1.10.1.jar:na]
	at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.10.1.jar:na]
	at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.10.1.jar:na]
	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_302]

Full logs: logs.txt
Run artifact: jepsen-single-instance.zip (see tarantool-cfeec1476e309dedb6a0ab0e7d14868c0d569685/20211031T025157.000Z/)

2 fails on bank-lua due to #94.

1 hang during dependencies installation (there is no information, what command hangs: curl or apt-get).

Details
2021-10-31T06:04:49.0051226Z /github/home/.lein/self-installs/leiningen-2.9.7-standalone.jar.pending: OK
<..6 hours without any output..>
2021-10-31T11:56:37.3951360Z ##[error]The operation was canceled.

Full logs: logs.txt

@Totktonada Totktonada self-assigned this Nov 1, 2021
@Totktonada Totktonada added bug Something isn't working infrastructure and removed bug Something isn't working labels Nov 1, 2021
Totktonada added a commit to tarantool/tarantool that referenced this issue Nov 1, 2021
Jepsen testing starts one or several virtual machines and runs tarantool
instances on them. The first (first important for us here) command on
the virtual machine is `apt-get <...> update`: we should download
packages list to allow Jepsen to install necessary dependencies.

However we can access the virtual machine (using ssh) before it is fully
initialized by the cloud-init script. In particular, the cloud-init
script replaces apt's mirror list file (`/etc/apt/sources.list`).

Normally we should call `apt-get <...> update` after the package list
update, but here cloud-init races with the update command.

In the bad case the commands are executed in the opposite order:

* Terraform calls `apt-get <...> update`.
* cloud-init replaces `/etc/apt/sources.list`.

Now an attempt to install a package using apt-get will give the 'unable
to locate package' error, because we have no packages list for the 'new'
mirrors.

The problem is nicely described in [1]. See also the linked issue for
details.

[1]: hashicorp/packer#41 (comment)

Fixes tarantool/jepsen.tarantool#87
Totktonada added a commit to tarantool/tarantool that referenced this issue Nov 2, 2021
Jepsen testing starts one or several virtual machines and runs tarantool
instances on them. The first (first important for us here) command on
the virtual machine is `apt-get <...> update`: we should download
packages list to allow Jepsen to install necessary dependencies.

However we can access the virtual machine (using ssh) before it is fully
initialized by the cloud-init script. In particular, the cloud-init
script replaces apt's mirror list file (`/etc/apt/sources.list`).

Normally we should call `apt-get <...> update` after the package list
update, but here cloud-init races with the update command.

In the bad case the commands are executed in the opposite order:

* Terraform calls `apt-get <...> update`.
* cloud-init replaces `/etc/apt/sources.list`.

Now an attempt to install a package using apt-get will give the 'unable
to locate package' error, because we have no packages list for the 'new'
mirrors.

The problem is nicely described in [1]. See also the linked issue for
details.

[1]: hashicorp/packer#41 (comment)

Fixes tarantool/jepsen.tarantool#87
@kyukhin kyukhin added the teamE label Nov 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants