-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Export all of Read the Docs #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I ran one basic backup for this, to test it out. It's just the raw HTML files from the prod servers, with the file paths and everything still in there. It is still pointing at production media files. An example: http://staticbackup.readthedocs.com/pip/rtd-builds/latest/index.html |
We've had this open for quite some time now, and I don't think there has been much enthusiasm over a feature like this. Given GDPR, I'm less inclined to keep data lying around indefinitely, and if we want this from an operation perspective, there are disk snapshots. I'm 👍 on closing this as something we're highly unlikely to implement. |
Yea, I'm +1 to close it. It's not something people really ask for, and except for like the internet archive, I don't have a solid usecase for it. |
Actually I am searching for that archive and I know its too much to ask for single person but can I get it. |
@sulabh1919 Unfortunately we have way too much data now to share this backup publicly in a reasonable way. |
Archive the entire catalog of generated HTML from Read the Docs.
The idea being to keep a history of all of the technical documentation hosted here,
at a certain point in time.
I'm thinking we could create a snapshot at the end of every month.
Then we would be able to see how things changed across themes and other bits and pieces over time.
Work needed
We need a nice way to browse all of the documents.
This will basically be a simple HTML file with links to all the docs.
Looking at twitters downloadable archive format would be good inspiration:
http://static.scripting.com/twitterArchives/keith/index.html
We also need to scrub all the links to remote media and images and package them inside of the archive.
This will likely grow the amount of data significantly,
but it is the only way to capture a place in time.
We could use something like
wget -m
or another script to pull these down and rewrite the links.Keeping the historical URLs might be a bit harder,
but at least keeping the files around will be a good start.
The text was updated successfully, but these errors were encountered: