-
Notifications
You must be signed in to change notification settings - Fork 0
Sometime dependencies installation failed #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sometimes jobs on CI with Jepsen tests failed on installation depenencies: ``` sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime" STDIN: null STDOUT: Reading package lists... Building dependency tree... Reading state information... STDERR: W: --force-yes is deprecated, use one of the options starting with --allow instead. E: Unable to locate package libzip4 E: Unable to locate package ntpdate E: Unable to locate package faketime ``` Problem looks as a flaky, I couldn't reproduce it locally. I suspect the root cause is an infrastructure problem and to get more details about it I have enabled debug options in apt-get and added `set -o errexit` as it is recommended in documentation [1] (see Note section). 1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference Part of: tarantool/jepsen.tarantool#87
Sometimes jobs on CI with Jepsen tests failed on installation depenencies: ``` sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime" STDIN: null STDOUT: Reading package lists... Building dependency tree... Reading state information... STDERR: W: --force-yes is deprecated, use one of the options starting with --allow instead. E: Unable to locate package libzip4 E: Unable to locate package ntpdate E: Unable to locate package faketime ``` Problem looks as a flaky, I couldn't reproduce it locally. I suspect the root cause is an infrastructure problem and to get more details about it I have enabled debug options in apt-get and added `set -o errexit` as it is recommended in documentation [1] (see Note section). 1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference Part of: tarantool/jepsen.tarantool#87
Sometimes jobs on CI with Jepsen tests failed on installation dependencies: ``` sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime" STDIN: null STDOUT: Reading package lists... Building dependency tree... Reading state information... STDERR: W: --force-yes is deprecated, use one of the options starting with --allow instead. E: Unable to locate package libzip4 E: Unable to locate package ntpdate E: Unable to locate package faketime ``` Problem looks as a flaky, I couldn't reproduce it locally. I suspect the root cause is an infrastructure problem and to get more details about it I have enabled debug options in apt-get and added `set -o errexit` as it is recommended in documentation [1] (see Note section). 1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference Part of: tarantool/jepsen.tarantool#87
Sometimes jobs on CI with Jepsen tests failed on installation dependencies: ``` sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime" STDIN: null STDOUT: Reading package lists... Building dependency tree... Reading state information... STDERR: W: --force-yes is deprecated, use one of the options starting with --allow instead. E: Unable to locate package libzip4 E: Unable to locate package ntpdate E: Unable to locate package faketime ``` Problem looks as a flaky, I couldn't reproduce it locally. I suspect the root cause is an infrastructure problem and to get more details about it I have enabled debug options in apt-get and added `set -o errexit` as it is recommended in documentation [1] (see Note section). 1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference Part of: tarantool/jepsen.tarantool#87 (cherry picked from commit f40afb8)
Sometimes jobs on CI with Jepsen tests failed on installation dependencies: ``` sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime" STDIN: null STDOUT: Reading package lists... Building dependency tree... Reading state information... STDERR: W: --force-yes is deprecated, use one of the options starting with --allow instead. E: Unable to locate package libzip4 E: Unable to locate package ntpdate E: Unable to locate package faketime ``` Problem looks as a flaky, I couldn't reproduce it locally. I suspect the root cause is an infrastructure problem and to get more details about it I have enabled debug options in apt-get and added `set -o errexit` as it is recommended in documentation [1] (see Note section). 1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference Part of: tarantool/jepsen.tarantool#87 (cherry picked from commit f40afb8)
Sometimes jobs on CI with Jepsen tests failed on installation dependencies: ``` sudo -S -u root bash -c "cd /; env DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes apt-transport-https libzip4 ntpdate faketime" STDIN: null STDOUT: Reading package lists... Building dependency tree... Reading state information... STDERR: W: --force-yes is deprecated, use one of the options starting with --allow instead. E: Unable to locate package libzip4 E: Unable to locate package ntpdate E: Unable to locate package faketime ``` Problem looks as a flaky, I couldn't reproduce it locally. I suspect the root cause is an infrastructure problem and to get more details about it I have enabled debug options in apt-get and added `set -o errexit` as it is recommended in documentation [1] (see Note section). 1. https://www.terraform.io/docs/language/resources/provisioners/remote-exec.html#argument-reference Part of: tarantool/jepsen.tarantool#87
ForewordsI found another problem, but, it seems, it's cause is the cause of this one as well. So I'll introduce the terms:
What I also want to say: I know really nothing about terraform, packer and cloud-init, so mistakes are possible: I'm just trying to understand what is going on based on what I see. My wording may be incorrect. Problem BI found this 'problem B' during testing of PR #93. It is about Ubuntu repositories as well, but symptoms are different:
It appears on apt-get <...> update before Jepsen starts. Full logs and other details are below. ExperimentsI injected the following code to debug the problem B (the patch is applied on the tarantool repository): diff --git a/extra/tf/main.tf b/extra/tf/main.tf
index abe8e606d..578968ac2 100644
--- a/extra/tf/main.tf
+++ b/extra/tf/main.tf
@@ -29,6 +29,7 @@ resource "openstack_compute_instance_v2" "instance" {
inline = [
"set -o errexit",
"sudo hostnamectl set-hostname n${count.index + 1}",
+ "cat -n /etc/apt/sources.list",
"sudo apt-get -o Debug::Acquire::http=true -o Debug::pkgAcquire::Worker=1 update"
]
} And run testing several times. Once during the those runs I meet the problem A and found the following difference in $ diff -u <(sed -e 's/^ \?[0-9]\+\t\?//' success-sources-list.txt) <(sed -e 's/^ \?[0-9]\+\t\?//' failure-sources-list.txt)
--- /dev/fd/63 2021-10-31 02:34:07.585041600 +0300
+++ /dev/fd/62 2021-10-31 02:34:07.585041600 +0300
@@ -1,46 +1,38 @@
-## Note, this file is written by cloud-init on first boot of an instance
-## modifications made here will not survive a re-bundle.
-
-## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
-## or do the same in user-data
-## b.) add sources in /etc/apt/sources.list.d
-## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
-
# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
# newer versions of the distribution.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic main restricted
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic main restricted
+deb http://archive.ubuntu.com/ubuntu/ bionic main restricted
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic main restricted
## Major bug fix updates produced after the final release of the
## distribution.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates main restricted
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates main restricted
+deb http://archive.ubuntu.com/ubuntu/ bionic-updates main restricted
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-updates main restricted
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team. Also, please note that software in universe WILL NOT receive any
## review or updates from the Ubuntu security team.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic universe
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic universe
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates universe
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates universe
+deb http://archive.ubuntu.com/ubuntu/ bionic universe
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic universe
+deb http://archive.ubuntu.com/ubuntu/ bionic-updates universe
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-updates universe
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic multiverse
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic multiverse
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates multiverse
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-updates multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic-updates multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-updates multiverse
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
-deb http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
-# deb-src http://DP1.clouds.archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-backports main restricted universe multiverse
## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
@@ -49,9 +41,9 @@
# deb http://archive.canonical.com/ubuntu bionic partner
# deb-src http://archive.canonical.com/ubuntu bionic partner
-deb http://security.ubuntu.com/ubuntu bionic-security main restricted
-# deb-src http://security.ubuntu.com/ubuntu bionic-security main restricted
-deb http://security.ubuntu.com/ubuntu bionic-security universe
-# deb-src http://security.ubuntu.com/ubuntu bionic-security universe
-deb http://security.ubuntu.com/ubuntu bionic-security multiverse
-# deb-src http://security.ubuntu.com/ubuntu bionic-security multiverse
+deb http://archive.ubuntu.com/ubuntu/ bionic-security main restricted
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-security main restricted
+deb http://archive.ubuntu.com/ubuntu/ bionic-security universe
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-security universe
+deb http://archive.ubuntu.com/ubuntu/ bionic-security multiverse
+# deb-src http://archive.ubuntu.com/ubuntu/ bionic-security multiverse I guess that problem B appears due to some transient state of the The raw output of the Full logs in both cases: InterpretationAs we can see from the There are several ways to do so, they're spread across the following threads:
As I see from this comment, there are ways to detect that cloud-init is initialized (it occurs only once) and that it is started. I guess that, since we deploy the instance from scratch each time and don't save any state between runs, any way should be okay for us. I like this solution: just call SolutionI would try the following: diff --git a/extra/tf/main.tf b/extra/tf/main.tf
index abe8e606d..1230efefc 100644
--- a/extra/tf/main.tf
+++ b/extra/tf/main.tf
@@ -28,6 +28,7 @@ resource "openstack_compute_instance_v2" "instance" {
provisioner "remote-exec" {
inline = [
"set -o errexit",
+ "sudo cloud-init status --wait",
"sudo hostnamectl set-hostname n${count.index + 1}",
"sudo apt-get -o Debug::Acquire::http=true -o Debug::pkgAcquire::Worker=1 update"
] I'll run it several times in CI and if I'll not see neither problem A, nor problem B anymore, I'll propose it in a pull request. Otherwise I'll write the new results here. |
I made 200 runs (100 runs of the 'jepsen-single-instance' workflow and 100 runs of the 'jepsen-single-instance-txm' workflow). The statistics is the following. 5 fails (2.5% of total runs) and nothing with symptoms as above. Looks as success! See details below. 2 crashes on bank-lua with InterruptedException on the bank-lua test. On the first glance, it looks as a problem in the tarantool-java connector. Details
Full logs: logs.txt 2 fails on bank-lua due to #94. 1 hang during dependencies installation (there is no information, what command hangs: curl or apt-get). Details
Full logs: logs.txt |
Jepsen testing starts one or several virtual machines and runs tarantool instances on them. The first (first important for us here) command on the virtual machine is `apt-get <...> update`: we should download packages list to allow Jepsen to install necessary dependencies. However we can access the virtual machine (using ssh) before it is fully initialized by the cloud-init script. In particular, the cloud-init script replaces apt's mirror list file (`/etc/apt/sources.list`). Normally we should call `apt-get <...> update` after the package list update, but here cloud-init races with the update command. In the bad case the commands are executed in the opposite order: * Terraform calls `apt-get <...> update`. * cloud-init replaces `/etc/apt/sources.list`. Now an attempt to install a package using apt-get will give the 'unable to locate package' error, because we have no packages list for the 'new' mirrors. The problem is nicely described in [1]. See also the linked issue for details. [1]: hashicorp/packer#41 (comment) Fixes tarantool/jepsen.tarantool#87
Jepsen testing starts one or several virtual machines and runs tarantool instances on them. The first (first important for us here) command on the virtual machine is `apt-get <...> update`: we should download packages list to allow Jepsen to install necessary dependencies. However we can access the virtual machine (using ssh) before it is fully initialized by the cloud-init script. In particular, the cloud-init script replaces apt's mirror list file (`/etc/apt/sources.list`). Normally we should call `apt-get <...> update` after the package list update, but here cloud-init races with the update command. In the bad case the commands are executed in the opposite order: * Terraform calls `apt-get <...> update`. * cloud-init replaces `/etc/apt/sources.list`. Now an attempt to install a package using apt-get will give the 'unable to locate package' error, because we have no packages list for the 'new' mirrors. The problem is nicely described in [1]. See also the linked issue for details. [1]: hashicorp/packer#41 (comment) Fixes tarantool/jepsen.tarantool#87
The text was updated successfully, but these errors were encountered: