Skip to content

resources: flaky issues on lack of memory #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
avtikhon opened this issue Mar 19, 2021 · 0 comments
Open

resources: flaky issues on lack of memory #98

avtikhon opened this issue Mar 19, 2021 · 0 comments
Labels
Milestone

Comments

@avtikhon
Copy link
Contributor

avtikhon commented Mar 19, 2021

Tarantool 2.8.0-114-g9ccd4eab6
Target: Linux-x86_64-Debug
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=ON
Compiler: /usr/bin/cc /usr/bin/c++
C_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type -Werror
CXX_FLAGS: -fexceptions -funwind-tables -fno-omit-frame-pointer -fno-stack-protector -fno-common -fopenmp -msse2 -std=c++11 -Wall -Wextra -Wno-strict-aliasing -Wno-char-subscripts -Wno-format-truncation -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type -Werror

OS: Linux

Check issues at #93

Reproduce on dev1:

Memory began to hang

Total Memory Swap # of test runs RSS Memory Time
2Gb 10 4 secs
2Gb 12 13 secs
2Gb 14 timeout
4Gb 24 4 secs
4Gb 26 timeout
8Gb 4Gb 48 7652044800 2 m 36 secs
8Gb 4Gb 50 7691522048 2 m 42 secs
8Gb 4Gb 56 8315879424 3 m 05 secs
8Gb 4Gb 58 8336347136 OOM + hanged container & host

Other tests from box/ suite (8 Gb | 4Gb):

Test # of test runs RSS Memory Time OOM on # runs Memory per test
access 58 6183579648 1 m 25 secs 107 Mb
blackhole 72 8306712576 1 m 14 secs 80 115 Mb
func_reload 90 7675752448 0 m 32 secs 100 85 Mb
gh-5135-invalid-upsert 110 8098770944 0 m 29 secs 120 74 Mb
gh-5422-broken_snapshot 56 8315879424 3 m 05 secs 58 150 Mb
iterator 110 8132214784 0 m 38 secs 120 74 Mb
misc 110 8134795264 1 m 00 secs 120 74 Mb
net_msg_max 3 7678414848 0 m 03 secs 4 2.5 Gb

Tool atop could not show the real issue in RSS, due to hanged itself (check RGROW):

THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST S CPU CMD
4 0.02s 0.12s 713.4M 71644K 0K 8K N- S 13% tarantool
1 0.05s 0.05s 167.7M 49160K 0K 140K N- S 0% python2

Try memory overload for container run with:
--cpus=2 --memory=8G --memory-swap=12G --memory-reservation=8G:

  1. run in container:
rm -rf rss_persec.log ; ( ( while date && sleep 1 ; do cat /sys/fs/cgroup/memory/memory.stat ; done ) >>rss_persec.log & echo $! >rss.pid & ) ; ( export PATH=$PATH:/tnt/src ; export REPLICATION_SYNC_TIMEOUT=2500 ; export TEST_TIMEOUT=2510 ; export NO_OUTPUT_TIMEOUT=2520 ; date ; time ./test-run.py -j 1200 --builddir /tnt --vardir var_hdd_vinyl `for r in {1..64} ; do echo box/gh-5422-broken_snapshot. ; done` --force 2>&1 ; sleep 1 ; kill -USR2 `cat rss.pid` ; date ) > test.log &
  1. run on host which runs container:
docker events
2021-03-16T08:21:57.833804548+03:00 container oom 7129eaff03192cfda5896e17ae3506935cc59c2c6310015b47ed28ffc7c41cc0 (image=registry.gitlab.com/tarantool/tarantool/testing/debian-stretch, name=goofy_mcnulty)
2021-03-16T08:22:31.626244711+03:00 container oom 7129eaff03192cfda5896e17ae3506935cc59c2c6310015b47ed28ffc7c41cc0 (image=registry.gitlab.com/tarantool/tarantool/testing/debian-stretch, name=goofy_mcnulty)
2021-03-16T08:22:36.535503324+03:00 container oom 7129eaff03192cfda5896e17ae3506935cc59c2c6310015b47ed28ffc7c41cc0 (image=registry.gitlab.com/tarantool/tarantool/testing/debian-stretch, name=goofy_mcnulty)
  1. check in container RSS maximums with:
grep total_rss\  rss_persec.log | sort

Try disk overload:

# start docker container with limitations in memory and enabled swap
docker run --network=host -v /export/avtikhon/src:/source -ti --cpus=40 --memory=2G --memory-swap=-1 --memory-reservation=1G registry.gitlab.com/tarantool/tarantool/testing/debian-stretch

# check available memory size with
/sys/fs/cgroup/memory/memory.limit_in_bytes

# run tests
( export PATH=$PATH:/tnt/src; export REPLICATION_SYNC_TIMEOUT=500; export TEST_TIMEOUT=510; export NO_OUTPUT_TIMEOUT=520; date; time ./test-run.py -j 1200 --builddir /tnt --vardir var_hdd_vinyl `for r in {1..12} ; do echo box/gh-5422-broken_snapshot ; done` --force 2>&1; sleep 1; kill -USR2 `cat atop.pid`; date ) > test_atop.log

Disks usage log from atop:

LVM | dm-3 | busy 718% | | read 60409 | write 501 | KiB/r 22 | KiB/w 4 | | MBr/s 1343.0 | MBw/s 2.0 | avq 12.84 | | avio 0.13 ms |
LVM | dm-2 | busy 10% | | read 612 | write 694 | KiB/r 18 | KiB/w 5 | | MBr/s 10.9 | MBw/s 3.7 | avq 7.85 | | avio 0.08 ms |
MDD | md1 | busy 0% | | read 60361 | write 519 | KiB/r 22 | KiB/w 3 | | MBr/s 1342.3 | MBw/s 1.9 | avq 0.00 | | avio 0.00 ms |
DSK | sda | busy 711% | | read 30011 | write 258 | KiB/r 22 | KiB/w 7 | | MBr/s 662.9 | MBw/s 2.0 | avq 6.67 | | avio 0.25 ms |
DSK | sdb | busy 697% | | read 29744 | write 258 | KiB/r 23 | KiB/w 7 | | MBr/s 679.0 | MBw/s 2.0 | avq 6.33 | | avio 0.25 ms |

Github Actions use hosts:
OSX:

Hardware:

    Hardware Overview:

      Model Name: Mac
      Model Identifier: VMware7,1
      Processor Name: Unknown
      Processor Speed: 3.33 GHz
      Number of Processors: 1
      Total Number of Cores: 3
      L2 Cache (per Core): 256 KB
      L3 Cache: 12 MB
      Memory: 14 GB
      System Firmware Version: VMW71.00V.13989454.B64.1906190538
      Apple ROM Info: [MS_VM_CERT/SHA1/27d66596a61c48dd3dc7216fd715126e33f59ae7]Welcome to the Virtual Machine
      SMC Version (system): 2.8f0
      Serial Number (system): VMXWGNGFhEKt
      Hardware UUID: 4203018E-580F-C1B5-9525-B745CECA79EB
      Provisioning UDID: 4203018E-580F-C1B5-9525-B745CECA79EB

Filesystem       Size   Used  Avail Capacity iused      ifree %iused  Mounted on
/dev/disk1s5s1  380Gi   14Gi  210Gi     7%  568975 3981971425    0%   /
/dev/disk1s4    380Gi  1.0Mi  210Gi     1%       1 3982540399    0%   /System/Volumes/VM
/dev/disk1s2    380Gi  279Mi  210Gi     1%     685 3982539715    0%   /System/Volumes/Preboot
/dev/disk1s6    380Gi  244Ki  210Gi     1%      14 3982540386    0%   /System/Volumes/Update
/dev/disk1s1    380Gi  154Gi  210Gi    43% 3970663 3978569737    0%   /System/Volumes/Data

Linux:

sudo cat /etc/os-release

NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

cat /proc/cpuinfo

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
stepping	: 1
microcode	: 0xffffffff
cpu MHz		: 2294.688
cache size	: 51200 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 4589.37
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
stepping	: 1
microcode	: 0xffffffff
cpu MHz		: 2294.688
cache size	: 51200 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips	: 4589.37
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

free

              total        used        free      shared  buff/cache   available
Mem:        7121288      467568     5682140       29240      971580     6319500
Swap:       4194300           0     4194300
Filesystem      Size  Used Avail Use% Mounted on
udev            3.4G     0  3.4G   0% /dev
tmpfs           696M  680K  695M   1% /run
/dev/sda1        84G   61G   23G  73% /
tmpfs           3.4G  8.0K  3.4G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.4G     0  3.4G   0% /sys/fs/cgroup
/dev/sda15      105M  3.7M  101M   4% /boot/efi
/dev/sdb1        14G  4.1G  9.0G  32% /mnt
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda       8:0    0   86G  0 disk 
├─sda1    8:1    0 85.9G  0 part /
├─sda14   8:14   0    4M  0 part 
└─sda15   8:15   0  106M  0 part /boot/efi
sdb       8:16   0   14G  0 disk 
└─sdb1    8:17   0   14G  0 part /mnt

sudo lsblk -o NAME,MOUNTPOINT,MODEL,ROTA

NAME    MOUNTPOINT MODEL            ROTA
sda                Virtual Disk        1
├─sda1  /                              1
├─sda14                                1
└─sda15 /boot/efi                      1
sdb                Virtual Disk        1
└─sdb1  /mnt                           1

Steps to resolve the issue:

  1. Add memory usage profiling for running tests test-run#277
@avtikhon avtikhon added the teamQ label Mar 19, 2021
@avtikhon avtikhon self-assigned this Mar 19, 2021
@kyukhin kyukhin added the epic label Mar 19, 2021
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 13, 2021
Added RSS memory status collecting routine which parses file

  /sys/fs/cgroup/memory/memory.stat

for RSS value.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 13, 2021
Add call to RSS memory collecting routine after each sent test lua
command.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 13, 2021
Added to worker listener additional call to RSS status checkin
routine to collect it in standalone file.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 13, 2021
Added to worker listener additional call to RSS status checking
routine to collect it in standalone file.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 13, 2021
Added RSS memory status collecting routine which parses file:

  /sys/fs/cgroup/memory/memory.stat

for RSS value. For this routine use added test-run new option:

  --collect-statistics

which should be used to use this routine.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 13, 2021
Add call to RSS memory collecting routine after each sent test lua
command.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 13, 2021
Added to worker listener additional call to RSS status checking
routine to collect it in standalone file.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 13, 2021
Added to worker listener additional call to RSS status checking
routine to collect it in standalone files, like:

  var/log/<worker name>.mem_stat.log

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 14, 2021
Added RSS memory status collecting routine which parses file

  /sys/fs/cgroup/memory/memory.stat

for RSS value.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 14, 2021
Add call to RSS memory collecting routine after each sent test lua
command.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 14, 2021
Added to worker listener additional call to RSS status checking
routine to collect it in standalone file.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 29, 2021
Added RSS memory status collecting routine which parses file:

  /sys/fs/cgroup/memory/memory.stat

for RSS value. For this routine use added test-run new option:

  --collect-statistics

which should be used to use this routine.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 29, 2021
Add call to RSS memory collecting routine after each sent test lua
command.

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 29, 2021
Added to worker listener additional call to RSS status checking
routine to collect it in standalone files, like:

  var/log/<worker name>.mem_stat.log

Part of tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue Apr 30, 2021
Added RSS memory status collecting routine which parses file:

  /sys/fs/cgroup/memory/memory.stat

for RSS value and run this routine after each sent test lua command in
listeners. All results it collects in main tests worker log file:

  var/log/<worker name>.log

For this routine use added test-run new option:

  --collect-statistics

which should be used to use this routine.

Closes tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue May 1, 2021
Added RSS memory status collecting routine which parses file:

  /sys/fs/cgroup/memory/memory.stat

Closes tarantool/tarantool-qa#98
avtikhon added a commit to tarantool/test-run that referenced this issue May 2, 2021
Added RSS memory status collecting routine which parses file:

  /sys/fs/cgroup/memory/memory.stat

Closes tarantool/tarantool-qa#98

got RSS in KB per worker

RSS with statistics
avtikhon added a commit to tarantool/test-run that referenced this issue May 2, 2021
Added RSS memory status collecting routine which parses file:

  /sys/fs/cgroup/memory/memory.stat

Closes tarantool/tarantool-qa#98

got RSS in KB per worker

RSS with statistics
avtikhon added a commit to tarantool/test-run that referenced this issue May 2, 2021
Added RSS memory status collecting routine which parses file:

  /sys/fs/cgroup/memory/memory.stat

Closes tarantool/tarantool-qa#98

got RSS in KB per worker

RSS with statistics
avtikhon added a commit to tarantool/test-run that referenced this issue May 3, 2021
Added RSS memory status collecting routine which parses file:

  /sys/fs/cgroup/memory/memory.stat

Closes tarantool/tarantool-qa#98

got RSS in KB per worker

RSS with statistics
avtikhon added a commit to tarantool/test-run that referenced this issue May 3, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.

Also created new listener 'RSSMonitor' which has the following routines
and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Checked and choosed RSS memory limit of 30000 kB RSS below which
      tests are not interesting to show except failed.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 3, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'RSSMonitor' which has the following routines
and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Checked and choosed RSS memory limit of 30000 kB RSS below which
      tests are not interesting to show except failed.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 4, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Checked and choosed RSS memory limit of 30000 kB RSS below which
      tests are not interesting to show except failed.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 4, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Checked and choosed RSS memory limit of 30000 kB RSS below which
      tests are not interesting to show except failed.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 4, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Checked and choosed RSS memory limit of 30000 kB RSS below which
      tests are not interesting to show except failed.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 4, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Checked and choosed RSS memory limit of 30000 kB RSS below which
      tests are not interesting to show except failed.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 4, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Checked and choosed RSS memory limit of 30000 kB RSS below which
      tests are not interesting to show except failed.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 5, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Checked and choosed RSS memory limit of 30000 kB RSS below which
      tests are not interesting to show except failed.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 5, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Checked and choosed RSS memory limit of 30000 kB RSS below which
      tests are not interesting to show except failed.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 5, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Prints RSS usage for failed tasks and up to 5 most used it tasks.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 5, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Prints RSS usage for failed tasks and up to 5 most used it tasks.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 5, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Prints RSS usage for failed tasks and up to 5 most used it tasks.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 10, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Prints RSS usage for failed tasks and up to 5 most used it tasks.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 10, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Prints RSS usage for failed tasks and up to 5 most used it tasks.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 10, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Prints RSS usage for failed tasks and up to 5 most used it tasks.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
avtikhon added a commit to tarantool/test-run that referenced this issue May 10, 2021
Found that some tests may fail due to lack of memory. Mostly it happens
in CI on remote hosts. To be able to collect memory used statistic
decided to add RSS memory status collecting routine get_proc_stat_rss()
which parses files:

  /proc/<worker pid>/status

for RSS value 'VmRSS' which is size of memory portions. It contains the
three following parts (VmRSS = RssAnon + RssFile + RssShmem) [1]:

  RssAnon - size of resident anonymous memory
  RssFile - size of resident file mappings
  RssShmem - size of resident shmem memory (includes SysV shm, mapping of
             tmpfs and shared anonymous mappings)

Decided that the best way for CI not to run this RSS collecting routine
for each sent command from tests tasks, but to run it when the test task
started and each 0.1 second delay after, to collect maximun RSS value.
For this delay used already existed delay of 1.0 sec used in listener
'HangWatcher'. Found that its change from 1.0 to 0.1 didn't increase the
testing times. Also found that delay of 0.1 sec is completely enough to
catch RSS use increase, due to tested check:

  tarantool> require('clock').bench(function() local t = {} for i = 1, 1024^2 * 100 do t[i] = true end end)

Which checked that 100 Mb of data allocated in seconds:
  - on CI test host: 3.153877479
  - on local fast host: 0.54504489

Also created new listener 'StatisticsMonitor' which has the following
routines and its use:

  process_result() - called when test task started and finished:
      Using 'WorkerCurrentTask' queue it saves the initial RSS value of
      the used worker when task started to run.
      Using 'WorkerTaskResult' queue collect tasks that failed.

  process_timeout() - called after each 0.1 sec delay of the task run.
      It saves/updates worker RSS value for the current test task
      choosing its maximum values.

  print_statistics() - statistics printing to stdout after testing.
      Prints RSS usage for failed tasks and up to 5 most used it tasks.
      Created new subdirectory 'statistics' in 'vardir' path to save
      statistics files. The current patch uses it to save there
      'rss.log' file with RSS values per tested tasks in format:
         <test task name> <maximum RSS value>

Closes tarantool/tarantool-qa#98

[1]: https://www.kernel.org/doc/html/latest/filesystems/proc.html
@kyukhin kyukhin added this to the wishlist milestone Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants