Use `tar` to extract cache, write by chunks and new build state #6793

humitos · 2020-03-18T19:31:53Z

We have experimented memory leak issues when extracting a file using the Python API. We are using regular tar for this operation now since the test done in the server with 2.5Gb has shown better results regarding memory usage and extracting time as well.

Besides, instead of reading the whole file into memory (fd.read()), we are iterating over its chunks and saving these small chunks into disk.

Finally, we are adding a new build status "Pulling cache" for now to communicate that the build is not in "Triggered" state, but it's actually doing something.

Related to #6763

We have experimented memory leak issues when extracting a file using the Python API. We are using regular `tar` for this operation now since the test done in the server with 2.5Gb has shown better results regarding memory usage and extracting time as well.

We were loading all the file in memory and then dumping it into the disk. Now, we just read a chunk and dump that chunk into the disk.

readthedocs/projects/tasks.py

Instead of downloading the file into a temporary file (`storage.open` and then `.read`) to save the file into another temporary file (our loop reading by chunks) we just use the `storage.open` response as the input of the `tarfile.open` function and extract it from there.

humitos · 2020-03-20T20:07:37Z

Closing in favor of #6799 and #6800

humitos requested a review from a team March 18, 2020 19:33

humitos added 2 commits March 18, 2020 18:30

Iterate over file chunks and save them on disk

395317a

We were loading all the file in memory and then dumping it into the disk. Now, we just read a chunk and dump that chunk into the disk.

Update build with "Pulling cache" when downloading the cache

abd1a3d

humitos changed the title ~~Use tar command to extract cached environment~~ Use tar to extract cache, write by chunks and new build state Mar 19, 2020

Migration file for new Build state

f34203d

humitos mentioned this pull request Mar 19, 2020

Do not reset the build start time when running build env #6794

Merged

humitos commented Mar 20, 2020

View reviewed changes

readthedocs/projects/tasks.py Outdated Show resolved Hide resolved

This was referenced Mar 20, 2020

Use storage.open API correctly for tar files (build cached envs) #6799

Merged

Update build with "Pulling cache" when downloading the cache #6800

Closed

humitos closed this Mar 20, 2020

stsewd deleted the humitos/use-tar-command branch July 28, 2020 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `tar` to extract cache, write by chunks and new build state #6793

Use `tar` to extract cache, write by chunks and new build state #6793

humitos commented Mar 18, 2020 •

edited

Loading

humitos commented Mar 20, 2020

Use tar to extract cache, write by chunks and new build state #6793

Use tar to extract cache, write by chunks and new build state #6793

Conversation

humitos commented Mar 18, 2020 • edited Loading

humitos commented Mar 20, 2020

Use `tar` to extract cache, write by chunks and new build state #6793

Use `tar` to extract cache, write by chunks and new build state #6793

humitos commented Mar 18, 2020 •

edited

Loading