[Backfill corrections] Allocate enough shared memory for parallel prediction generation #1827

nmdefries · 2023-04-11T18:23:57Z

Description

Main change here is to make sure the docker container has enough shared memory to generate predictions.

Smaller changes:

Logging
Make sure receiving directory is cleared after a run
Check for "SIGBUS" message in log file

Changelog

Makefile
main.R

Fixes

Generating predictions in parallel (even with 2 cores) resulted in the error 'memcpy' resulted in a SIGBUS (no shared memory left). Docker containers have 64 MB shared memory by default, which is apparently sufficient for modeling training in parallel but not generating predictions. This change assigns 2 GB of shared memory, which is enough for generating predictions in parallel with up to 3 cores (and maybe more -- didn't test).

krivard

👍

nice extension of the log checks; it's always easier on monitoring if the make fails when something has gone wrong

nmdefries added 5 commits April 11, 2023 12:30

log when geo-splitting is over

510a11e

remove everything in output dir

02c8c40

increase shared memory size to support parallel prediction generation

dd671bf

check logs for lack of shared memory error

8875b40

lower shared memory allocation

b55abe8

nmdefries marked this pull request as ready for review April 11, 2023 21:37

nmdefries requested a review from krivard April 11, 2023 21:37

krivard approved these changes Apr 12, 2023

View reviewed changes

nmdefries merged commit 7ef2f5b into main Apr 12, 2023

nmdefries deleted the ndefries/backfill/mem-log-make-cleanup branch April 12, 2023 14:05

krivard mentioned this pull request Apr 12, 2023

Release covidcast-indicators 0.3.36 #1829

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Backfill corrections] Allocate enough shared memory for parallel prediction generation #1827

[Backfill corrections] Allocate enough shared memory for parallel prediction generation #1827

Uh oh!

nmdefries commented Apr 11, 2023 •

edited

Loading

Uh oh!

krivard left a comment

Uh oh!

Uh oh!

[Backfill corrections] Allocate enough shared memory for parallel prediction generation #1827

[Backfill corrections] Allocate enough shared memory for parallel prediction generation #1827

Uh oh!

Conversation

nmdefries commented Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changelog

Fixes

Uh oh!

krivard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nmdefries commented Apr 11, 2023 •

edited

Loading