Skip to content

detect removed/unavailable/404 repository and take generated output offline #8570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
crackwitz opened this issue Oct 7, 2021 · 9 comments
Closed
Labels
Needed: design decision A core team decision is required

Comments

@crackwitz
Copy link

Details

Expected Result

I think that if a source repo is removed, so should the generated documentation on RTD. If such an action is delayed for some "grace period", I think that's reasonable.

Actual Result

Generated output is still there (see URL above) even though the repo was taken offline by the owner because it's a stale fork of another repo that was upstreamed into the actual open source project.

@crackwitz crackwitz changed the title detect removed/unavailable/404 repositories and take project offline detect removed/unavailable/404 repository and take project offline Oct 7, 2021
@crackwitz crackwitz changed the title detect removed/unavailable/404 repository and take project offline detect removed/unavailable/404 repository and take generated output offline Oct 7, 2021
@astrojuanlu
Copy link
Contributor

That would actually help us with some spam issues we're having 🙂 But detecting if a project is "gone" might be very resource-intensive, I assume? (Like, polling every repository of every project on RTD with some cadence).

A different case is checking when the project admin changes the URL. I think @humitos proposed something along these lines. But still, doesn't solve your original request, I think.

@crackwitz
Copy link
Author

crackwitz commented Oct 7, 2021

RTD Stats 2020 says 240k projects... WEW I did not expect that many.

once every 3 months for everything? that'd be ~3k to check every day.

or going by how long ago last activity was noted? that should keep all those active ones out of the set to be checked. I'm guessing you usually get all of that triggered via web hooks or something, so the "last modified" information would at least take care of itself.

I see there is a delete (webhook) action for branches but it doesn't seem to exist for repositories... who knows, maybe it gets triggered for each branch in a repo that is being deleted? I wouldn't know. I saw mention of such an action/event being fired for "organizations".

@stsewd
Copy link
Member

stsewd commented Oct 8, 2021

So, we can't just delete docs if the repo is not accessible, it could be a temporal problem or an error. Deleting docs is an irreversible operation that should be taken by the user, not automatically. And also, some people point to an invalid repo as a way to mark the project as abandoned/disabled. See also #8143.

@stsewd stsewd added the Needed: design decision A core team decision is required label Oct 8, 2021
@stsewd
Copy link
Member

stsewd commented Oct 8, 2021

And, if this is about take the over the project, we have a policy for that https://docs.readthedocs.io/en/stable/abandoned-projects.html

@crackwitz
Copy link
Author

crackwitz commented Oct 8, 2021

No, this is about that:
OpenCV is an active project with actively maintained documentation.
People fork it, make a few commits if any, and then leave their version of the documentation on RTD.
That example repo up there was last touched in 2016, and removed by the owner in 2021.
I hate for newbies to find it and work off information that is five years out of date

And I think this might be a general pattern, also for other forked projects, and abandoned projects with a live fork, so I'm explicitly not here for this one instance.

There is an Abandoned Projects policy, so this could be a utility to detect those

@stsewd
Copy link
Member

stsewd commented Oct 11, 2021

Still, we can't just delete those projects, I understand that this is a problem, but we don't dictate the content users should publish.

If this is just to mark or identify those type or projects, we already have #3382 open. The abandoned project policy is used per-user request, isn't something we would do automatically.

There were some other ideas about having a "verified" status for projects, so they have more priority over forks/clones.

@humitos
Copy link
Member

humitos commented Oct 13, 2021

People fork it, make a few commits if any, and then leave their version of the documentation on RTD.

@crackwitz Hi! You can probably reduce this problem a lot by enabling Pull Request builder (see https://blog.readthedocs.com/pull-request-builder-general-availability/) if you haven't already. That way, any person forking the project doesn't have the need to import it under Read the Docs because their PR will automatically build on RTD under the official (your) project.

@humitos
Copy link
Member

humitos commented Oct 13, 2021

We definitely can't delete people's documentation "automatically" or "semi-automatically" based on these rules. It's easy to do it wrong and too risky. Besides, even if the repository linked was deleted/moved/anything the documentation may still be relevant/important --we can't make the assumption "since the repository was deleted, the documentation should be deleted as well"

@ericholscher
Copy link
Member

Yea, I'm going to close this issue. We are working towards making a policy for removing unofficial, outdated docs, which is what I think this issue is mostly about. So this will be solved via human judgement, not automated systems 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needed: design decision A core team decision is required
Projects
None yet
Development

No branches or pull requests

5 participants