Skip to content

Export all of Read the Docs #562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ericholscher opened this issue Dec 4, 2013 · 5 comments
Closed

Export all of Read the Docs #562

ericholscher opened this issue Dec 4, 2013 · 5 comments
Labels
Feature New feature

Comments

@ericholscher
Copy link
Member

Archive the entire catalog of generated HTML from Read the Docs.
The idea being to keep a history of all of the technical documentation hosted here,
at a certain point in time.

I'm thinking we could create a snapshot at the end of every month.
Then we would be able to see how things changed across themes and other bits and pieces over time.

Work needed

We need a nice way to browse all of the documents.
This will basically be a simple HTML file with links to all the docs.
Looking at twitters downloadable archive format would be good inspiration:

http://static.scripting.com/twitterArchives/keith/index.html

We also need to scrub all the links to remote media and images and package them inside of the archive.
This will likely grow the amount of data significantly,
but it is the only way to capture a place in time.
We could use something like wget -m or another script to pull these down and rewrite the links.

Keeping the historical URLs might be a bit harder,
but at least keeping the files around will be a good start.

@ericholscher
Copy link
Member Author

I ran one basic backup for this, to test it out. It's just the raw HTML files from the prod servers, with the file paths and everything still in there. It is still pointing at production media files. An example:

http://staticbackup.readthedocs.com/pip/rtd-builds/latest/index.html

@agjohnson
Copy link
Contributor

We've had this open for quite some time now, and I don't think there has been much enthusiasm over a feature like this. Given GDPR, I'm less inclined to keep data lying around indefinitely, and if we want this from an operation perspective, there are disk snapshots.

I'm 👍 on closing this as something we're highly unlikely to implement.

@ericholscher
Copy link
Member Author

Yea, I'm +1 to close it. It's not something people really ask for, and except for like the internet archive, I don't have a solid usecase for it.

@sulabh-npl
Copy link

Actually I am searching for that archive and I know its too much to ask for single person but can I get it.

@ericholscher
Copy link
Member Author

@sulabh1919 Unfortunately we have way too much data now to share this backup publicly in a reasonable way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature
Projects
None yet
Development

No branches or pull requests

3 participants