New unresolver implementation #9500

stsewd · 2022-08-15T23:16:01Z

Move unresolve_domain to unresolver
Use the new implementation from Design doc: Better handling of docs URLs #9425 for the unresolver
Remove settings override from the unresolver, we don't need it.

This doesn't use this new implementation to handle the URLs from proxito nor has support for a custom urlconf, those things will be implemented in another PR.

This is on top of #9462

📚 Documentation preview 📚: https://docs--9500.org.readthedocs.build/en/9500/

📚 Documentation preview 📚: https://dev--9500.org.readthedocs.build/en/9500/

ericholscher

This logic cleans up the existing code nicely. I want to better understand the whole picture of this work. Is the next step to use the unresolver in proxito to standardize how we're parsing this in each place in the code?

readthedocs/core/unresolver.py

ericholscher · 2022-08-18T00:04:25Z

readthedocs/core/unresolver.py

+        version = canonical_project.versions.filter(
+            slug=canonical_project.default_version
+        ).first()
+        if version:


In what case will this be False? I think when they have a default_version that doesn't exist?

yeah, this is when a use doesn't have a default version or the default version is invalid

ericholscher · 2022-08-18T00:08:53Z

readthedocs/core/unresolver.py

+        If the subproject exists, we try to resolve the rest of the path
+        with the subproject as the canonical project.
+        """
+        match = self.subproject_pattern.match(path)


Suggested change

match = self.subproject_pattern.match(path)

# TODO: Allow the canonical_project to override this url somehow..

# if canonical_project.subproject_pattern:

# match = canonical_project.subproject_pattern.match(path)

match = self.subproject_pattern.match(path)

Is this the longer term goal for customization?

yeah, basically on each call to match, will be some extra logic to use the custom pattern from the project instead of the default one.

The same idea should be used for multiversion patterns 👍🏼

stsewd · 2022-08-18T00:47:57Z

Is the next step to use the unresolver in proxito to standardize how we're parsing this in each place in the code?

Yes, proxito will eventually use this to serve the documentation, but first I'll add the support for custom urlconfs (or a port of that), since we have users using that feature already on .com.

ericholscher

I think these changes look good. @humitos not sure if you wanted to review this.

@stsewd This is safe to merge and use before we implement it in proxito right? If so, I'm fine with it going on this deploy.

stsewd · 2022-08-22T21:40:43Z

@stsewd This is safe to merge and use before we implement it in proxito right? If so, I'm fine with it going on this deploy.

Yeah, it's safe to deploy.

ericholscher · 2022-08-23T18:05:18Z

@stsewd Great, I'd say let's get it merged then so we can do dev on it this week to catch any outstanding issues.

humitos · 2022-08-23T18:39:40Z

@ericholscher

I think these changes look good. @humitos not sure if you wanted to review this.

Yeah, I want to review this still. I got pretty overload of things and wasn't able to jump into this yet.

I'll try to do it tomorrow.

humitos

This looks good! 💯

I added some comments about styles and naming. My idea here is to keep consistency through all the codebase and make the names pretty explicit. I think it's important to read the name of the variable and immediately realize what its refers to. As example, subproject_alias works better than project_slug; since project_slug is a lot more generic and could refer to something different than the subproject's alias.

Also, I'd like see those returned tuples converted into dataclasses or similar objects that we can access them via dot notation with clear names.

It's also worth to note that this implementation supports the following cases:

single version
multi-language multi-version
subprojects with single version
subprojects with multi-language multi-version

However, it does not support "multi-language single-version".

readthedocs/core/unresolver.py

humitos · 2022-09-07T09:33:45Z

readthedocs/core/unresolver.py

-        request = RequestFactory().get(path=parsed.path, HTTP_HOST=domain)
-        project_slug = map_host_to_project_slug(request)
+        domain = self.get_domain_from_host(parsed.netloc)
+        project_slug, domain_object, external = self.unresolve_domain(domain)


We are using project_slug to refer to the parent_project_slug here. Below, we have parent_project and project too.

I think it would be good to refactor this to use more descriptive names:

parent_project_slug

parent_project

current_project or final_project or similar (in the docstring you are using "current project", which is a lot more descriptive, see https://github.com/readthedocs/readthedocs.org/pull/9500/files#diff-3f2b884fdd4752fe36587aee4530694e7a4ff10a8f4523a7b0bc25e9d2f11e11R24)

Probably makes sense to change the UnresolvedURL to use current_project as well so we keep consistency all across our code.

readthedocs/core/unresolver.py

humitos · 2022-09-07T09:44:57Z

readthedocs/core/unresolver.py

+            filename = "/" + filename
+        return filename
+
+    def _match_multiversion_project(self, parent_project, path):


I think a better name for this is _match_multiversion_url or _match_multiversion_path since it works over the url/path

humitos · 2022-09-07T10:04:31Z

readthedocs/core/unresolver.py

+        :returns: A tuple with: the project slug, domain object, and if the domain
+         is from an external version.


It would be good to make this a dataclass or similar. Returning these structured tuples is always hard to parse mentally and keep that in mind while working with this code and switching between files

humitos · 2022-09-07T10:05:51Z

readthedocs/core/unresolver.py

+        :param check_subprojects: If we should check for subprojects,
+         this is used to call this function recursively.
+
+        :returns: A tuple with: project, version, and file name.


It would be good to make this a dataclass or similar here as well and all these related methods.

I think it is okay to have this as a tuple, they are internal methods, and we already have one dataclass for the final result.

Even if they are internal methods, real objects are a lot easier to read and follow while debugging/reading this code.

readthedocs/core/unresolver.py

humitos · 2022-09-07T10:13:11Z

readthedocs/core/unresolver.py

-class Unresolver(SettingsOverrideObject):
+            # TODO: This can catch some possibly valid domains (docs.readthedocs.io.com)
+            # for example, but these might be phishing, so let's ignore them for now.
+            log.warning("Weird variation of our domain.", domain=domain)


I took a look at this at New Relic and it seems we don't have any log for this: https://onenr.io/0oQD4oYZ5Qy 👍🏼

humitos · 2022-09-07T10:15:31Z

readthedocs/core/unresolver.py


-    _default_class = UnresolverBase
+        log.info("Invalid domain.", domain=domain)


We do have some of these on New Relic: https://onenr.io/0Zw06DrADjv

humitos · 2022-09-07T10:23:05Z

I added some comments about styles and naming. My idea here is to keep consistency through all the codebase and make the names pretty explicit.

Also, I just noticed that we are calling root_project in the design doc, but in the implementation we are just calling it project. I think it's important to keep consistency here so we all speak the same language and refer to the same thing with the same name. It seems that root_project is more specific and I think it makes sense to use it in the code as well: https://dev.readthedocs.io/en/stable/design/better-doc-urls-handling.html#alternative-implementation

stsewd added 13 commits August 1, 2022 10:38

Move code

e34fe11

Remove request

6ac26b0

Return tuple

a732260

Replace

0672724

Refactor

185771d

Updates from review

3d30d6b

Merge branch 'main' into refactor-middleware

fce39ac

Move get_domain_from_host to unresolver

f583d2c

Move unresolve_domain

789f74c

Implement unresolve_path

067367d

Use new implementation for unresolver

1cf32b4

Tests

53fa8aa

Refactor and cleanup

a62c149

stsewd changed the title ~~Move to unresolver~~ New unresolver implementation Aug 15, 2022

stsewd marked this pull request as ready for review August 15, 2022 23:32

stsewd requested a review from a team as a code owner August 15, 2022 23:32

stsewd requested a review from humitos August 15, 2022 23:32

auto-assign bot assigned stsewd Aug 15, 2022

ericholscher reviewed Aug 18, 2022

View reviewed changes

Base automatically changed from refactor-middleware to main August 18, 2022 15:36

stsewd added 4 commits August 18, 2022 12:33

Merge branch 'main' into move-to-unresolver

63d72c3

Rename canonical_project -> parent_project

aba6fc4

Lint

cfe16b7

Fix

6900724

ericholscher approved these changes Aug 22, 2022

View reviewed changes

stsewd mentioned this pull request Aug 22, 2022

Unresolver: strict validation for external versions and other fixes #9534

Merged

stsewd merged commit cd72e84 into main Aug 23, 2022

stsewd deleted the move-to-unresolver branch August 23, 2022 18:08

humitos reviewed Sep 7, 2022

View reviewed changes

stsewd added a commit that referenced this pull request Sep 14, 2022

Updates from review from #9500

4d41069

ericholscher mentioned this pull request Jan 17, 2023

Proxito: Next steps on Improve URL Handling #9911

Closed

-        match = self.subproject_pattern.match(path)
+        #  TODO: Allow the canonical_project to override this url somehow..
+        # if canonical_project.subproject_pattern:
+        #     match = canonical_project.subproject_pattern.match(path)
+        match = self.subproject_pattern.match(path)

		:returns: A tuple with: the project slug, domain object, and if the domain
		is from an external version.


		_default_class = UnresolverBase
		log.info("Invalid domain.", domain=domain)

Uh oh!

New unresolver implementation #9500

New unresolver implementation #9500

Uh oh!

Conversation

stsewd commented Aug 15, 2022 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericholscher left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stsewd commented Aug 18, 2022

Uh oh!

ericholscher left a comment

Choose a reason for hiding this comment

Uh oh!

stsewd commented Aug 22, 2022

Uh oh!

ericholscher commented Aug 23, 2022

Uh oh!

humitos commented Aug 23, 2022

Uh oh!

humitos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

humitos commented Sep 7, 2022

Uh oh!

Uh oh!

stsewd commented Aug 15, 2022 •

edited by github-actions bot

Loading

ericholscher left a comment •

edited

Loading