-
Notifications
You must be signed in to change notification settings - Fork 0
resources: flaky issues on lack of memory #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This was referenced Mar 22, 2021
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 13, 2021
Added RSS memory status collecting routine which parses file /sys/fs/cgroup/memory/memory.stat for RSS value. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 13, 2021
Add call to RSS memory collecting routine after each sent test lua command. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 13, 2021
Added to worker listener additional call to RSS status checkin routine to collect it in standalone file. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 13, 2021
Added to worker listener additional call to RSS status checking routine to collect it in standalone file. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 13, 2021
Added RSS memory status collecting routine which parses file: /sys/fs/cgroup/memory/memory.stat for RSS value. For this routine use added test-run new option: --collect-statistics which should be used to use this routine. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 13, 2021
Add call to RSS memory collecting routine after each sent test lua command. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 13, 2021
Added to worker listener additional call to RSS status checking routine to collect it in standalone file. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 13, 2021
Added to worker listener additional call to RSS status checking routine to collect it in standalone files, like: var/log/<worker name>.mem_stat.log Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 14, 2021
Added RSS memory status collecting routine which parses file /sys/fs/cgroup/memory/memory.stat for RSS value. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 14, 2021
Add call to RSS memory collecting routine after each sent test lua command. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 14, 2021
Added to worker listener additional call to RSS status checking routine to collect it in standalone file. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 29, 2021
Added RSS memory status collecting routine which parses file: /sys/fs/cgroup/memory/memory.stat for RSS value. For this routine use added test-run new option: --collect-statistics which should be used to use this routine. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 29, 2021
Add call to RSS memory collecting routine after each sent test lua command. Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 29, 2021
Added to worker listener additional call to RSS status checking routine to collect it in standalone files, like: var/log/<worker name>.mem_stat.log Part of tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
Apr 30, 2021
Added RSS memory status collecting routine which parses file: /sys/fs/cgroup/memory/memory.stat for RSS value and run this routine after each sent test lua command in listeners. All results it collects in main tests worker log file: var/log/<worker name>.log For this routine use added test-run new option: --collect-statistics which should be used to use this routine. Closes tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 1, 2021
Added RSS memory status collecting routine which parses file: /sys/fs/cgroup/memory/memory.stat Closes tarantool/tarantool-qa#98
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 2, 2021
Added RSS memory status collecting routine which parses file: /sys/fs/cgroup/memory/memory.stat Closes tarantool/tarantool-qa#98 got RSS in KB per worker RSS with statistics
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 2, 2021
Added RSS memory status collecting routine which parses file: /sys/fs/cgroup/memory/memory.stat Closes tarantool/tarantool-qa#98 got RSS in KB per worker RSS with statistics
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 2, 2021
Added RSS memory status collecting routine which parses file: /sys/fs/cgroup/memory/memory.stat Closes tarantool/tarantool-qa#98 got RSS in KB per worker RSS with statistics
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 3, 2021
Added RSS memory status collecting routine which parses file: /sys/fs/cgroup/memory/memory.stat Closes tarantool/tarantool-qa#98 got RSS in KB per worker RSS with statistics
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 3, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. Also created new listener 'RSSMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Checked and choosed RSS memory limit of 30000 kB RSS below which tests are not interesting to show except failed. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 3, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'RSSMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Checked and choosed RSS memory limit of 30000 kB RSS below which tests are not interesting to show except failed. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 4, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Checked and choosed RSS memory limit of 30000 kB RSS below which tests are not interesting to show except failed. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 4, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Checked and choosed RSS memory limit of 30000 kB RSS below which tests are not interesting to show except failed. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 4, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Checked and choosed RSS memory limit of 30000 kB RSS below which tests are not interesting to show except failed. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 4, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Checked and choosed RSS memory limit of 30000 kB RSS below which tests are not interesting to show except failed. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 4, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Checked and choosed RSS memory limit of 30000 kB RSS below which tests are not interesting to show except failed. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 5, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Checked and choosed RSS memory limit of 30000 kB RSS below which tests are not interesting to show except failed. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 5, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Checked and choosed RSS memory limit of 30000 kB RSS below which tests are not interesting to show except failed. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 5, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Prints RSS usage for failed tasks and up to 5 most used it tasks. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 5, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Prints RSS usage for failed tasks and up to 5 most used it tasks. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 5, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Prints RSS usage for failed tasks and up to 5 most used it tasks. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 10, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Prints RSS usage for failed tasks and up to 5 most used it tasks. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 10, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Prints RSS usage for failed tasks and up to 5 most used it tasks. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 10, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Prints RSS usage for failed tasks and up to 5 most used it tasks. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon
added a commit
to tarantool/test-run
that referenced
this issue
May 10, 2021
Found that some tests may fail due to lack of memory. Mostly it happens in CI on remote hosts. To be able to collect memory used statistic decided to add RSS memory status collecting routine get_proc_stat_rss() which parses files: /proc/<worker pid>/status for RSS value 'VmRSS' which is size of memory portions. It contains the three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]: RssAnon - size of resident anonymous memory RssFile - size of resident file mappings RssShmem - size of resident shmem memory (includes SysV shm, mapping of tmpfs and shared anonymous mappings) Decided that the best way for CI not to run this RSS collecting routine for each sent command from tests tasks, but to run it when the test task started and each 0.1 second delay after, to collect maximun RSS value. For this delay used already existed delay of 1.0 sec used in listener 'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the testing times. Also found that delay of 0.1 sec is completely enough to catch RSS use increase, due to tested check: tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end) Which checked that 100 Mb of data allocated in seconds: - on CI test host: 3.153877479 - on local fast host: 0.54504489 Also created new listener 'StatisticsMonitor' which has the following routines and its use: process_result() - called when test task started and finished: Using 'WorkerCurrentTask' queue it saves the initial RSS value of the used worker when task started to run. Using 'WorkerTaskResult' queue collect tasks that failed. process_timeout() - called after each 0.1 sec delay of the task run. It saves/updates worker RSS value for the current test task choosing its maximum values. print_statistics() - statistics printing to stdout after testing. Prints RSS usage for failed tasks and up to 5 most used it tasks. Created new subdirectory 'statistics' in 'vardir' path to save statistics files. The current patch uses it to save there 'rss.log' file with RSS values per tested tasks in format: <test task name> <maximum RSS value> Closes tarantool/tarantool-qa#98 [1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Tarantool 2.8.0-114-g9ccd4eab6
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror
OS: Linux
Check issues at #93
Reproduce on dev1:
Memory began to hang
Other tests from box/ suite (8 Gb | 4Gb):
Tool atop could not show the real issue in RSS, due to hanged itself (check RGROW):
Try memory overload for container run with:
--cpus=2 --memory=8G --memory-swap=12G --memory-reservation=8G:
Try disk overload:
Disks usage log from atop:
LVM | dm-3 | busy 718% | | read 60409 | write 501 | KiB/r 22 | KiB/w 4 | | MBr/s 1343.0 | MBw/s 2.0 | avq 12.84 | | avio 0.13 ms |
LVM | dm-2 | busy 10% | | read 612 | write 694 | KiB/r 18 | KiB/w 5 | | MBr/s 10.9 | MBw/s 3.7 | avq 7.85 | | avio 0.08 ms |
MDD | md1 | busy 0% | | read 60361 | write 519 | KiB/r 22 | KiB/w 3 | | MBr/s 1342.3 | MBw/s 1.9 | avq 0.00 | | avio 0.00 ms |
DSK | sda | busy 711% | | read 30011 | write 258 | KiB/r 22 | KiB/w 7 | | MBr/s 662.9 | MBw/s 2.0 | avq 6.67 | | avio 0.25 ms |
DSK | sdb | busy 697% | | read 29744 | write 258 | KiB/r 23 | KiB/w 7 | | MBr/s 679.0 | MBw/s 2.0 | avq 6.33 | | avio 0.25 ms |
Github Actions use hosts:
OSX:
Linux:
Steps to resolve the issue:
The text was updated successfully, but these errors were encountered: