Export all of Read the Docs #562

ericholscher · 2013-12-04T22:04:27Z

Archive the entire catalog of generated HTML from Read the Docs.
The idea being to keep a history of all of the technical documentation hosted here,
at a certain point in time.

I'm thinking we could create a snapshot at the end of every month.
Then we would be able to see how things changed across themes and other bits and pieces over time.

Work needed

We need a nice way to browse all of the documents.
This will basically be a simple HTML file with links to all the docs.
Looking at twitters downloadable archive format would be good inspiration:

http://static.scripting.com/twitterArchives/keith/index.html

We also need to scrub all the links to remote media and images and package them inside of the archive.
This will likely grow the amount of data significantly,
but it is the only way to capture a place in time.
We could use something like wget -m or another script to pull these down and rewrite the links.

Keeping the historical URLs might be a bit harder,
but at least keeping the files around will be a good start.

The text was updated successfully, but these errors were encountered:

ericholscher · 2013-12-28T20:17:11Z

I ran one basic backup for this, to test it out. It's just the raw HTML files from the prod servers, with the file paths and everything still in there. It is still pointing at production media files. An example:

http://staticbackup.readthedocs.com/pip/rtd-builds/latest/index.html

agjohnson · 2018-09-17T22:59:48Z

We've had this open for quite some time now, and I don't think there has been much enthusiasm over a feature like this. Given GDPR, I'm less inclined to keep data lying around indefinitely, and if we want this from an operation perspective, there are disk snapshots.

I'm 👍 on closing this as something we're highly unlikely to implement.

ericholscher · 2018-09-20T16:03:45Z

Yea, I'm +1 to close it. It's not something people really ask for, and except for like the internet archive, I don't have a solid usecase for it.

sulabh-npl · 2023-03-11T11:41:01Z

Actually I am searching for that archive and I know its too much to ask for single person but can I get it.

ericholscher · 2023-03-13T20:49:54Z

@sulabh1919 Unfortunately we have way too much data now to share this backup publicly in a reasonable way.

ericholscher added the Sprintable label Sep 3, 2014

ericholscher mentioned this issue Oct 31, 2017

Add project delete method that wipes build artifacts #3145

Closed

2 tasks

agjohnson removed the Sprintable Small enough to sprint on label Sep 17, 2018

ericholscher closed this as completed Sep 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export all of Read the Docs #562

Export all of Read the Docs #562

ericholscher commented Dec 4, 2013

ericholscher commented Dec 28, 2013

agjohnson commented Sep 17, 2018

ericholscher commented Sep 20, 2018

sulabh-npl commented Mar 11, 2023

ericholscher commented Mar 13, 2023

Export all of Read the Docs #562

Export all of Read the Docs #562

Comments

ericholscher commented Dec 4, 2013

Work needed

ericholscher commented Dec 28, 2013

agjohnson commented Sep 17, 2018

ericholscher commented Sep 20, 2018

sulabh-npl commented Mar 11, 2023

ericholscher commented Mar 13, 2023