Proxito: separate project slug extraction from request manipulation #9462

stsewd · 2022-08-01T17:17:55Z

This is so we can use the project slug extraction in the new implementation of the unresolver. The refactor is split into several commits.

📚 Documentation preview 📚: https://docs--9462.org.readthedocs.build/en/9462/

humitos

Looks good to me! 👍🏼 Thanks for splitting the PR into different commits.

I just ping @ericholscher to take a look as well since he was involved into this.

readthedocs/proxito/middleware.py

humitos · 2022-08-09T11:15:27Z

readthedocs/proxito/middleware.py

+    :returns: A tuple with the project slug, domain object, and if the domain
+     is external.


Does it make sense to use one of those nice dataclasses that you've used in other places here as well? 😄

I started using dataclases, but then it felt a little too much for this case, but I may use a dataclass when moving this to the unversolver code.

readthedocs/proxito/middleware.py

humitos · 2022-08-09T11:21:25Z

readthedocs/proxito/middleware.py

+    return host.lower().split(":")[0]
+
+
+def _unresolve_domain(domain):


It would be good if this function logs "where did we get the project slug from?". Was it from a subdomain of public domain? a custom domain? a subdomain of the external domain?

It could be in info mode for now while we are working on this refactor, so we can check that things are working correctly in production. Then, we can lower it down to debug.

These are also logged outside the function

readthedocs.org/readthedocs/proxito/middleware.py

Line 125 in 185771d

log.debug('Proxito CNAME.', host=host)

, similar to #9462 (comment) I wasn't sure if we should logs this outside or inside the function, also, they are set as debug as the old logs.

If you think this makes more sense inside the function, let me know.

I think we should move the logic into this function, since this function will also be called in other places other than proxito after it's moved to the unresolver, right?

humitos · 2022-08-09T11:28:55Z

readthedocs/proxito/middleware.py

        request.cname = True
-        request.domain = domain
+        request.domain = domain_object


I don't think we need to do what I'm going to propose in this PR, but, can we avoid passing things by injecting things in the request? In case this is not possible, or it's our best option, can we define a specific well-documented object?

request.readthedocs = ReadTheDocsRequest( cname=bool, domain=Domain, external_domain=bool, host_version_slug=str, canonicalize=bool, subdomain=bool, )

I think we have this kind of injections all over our code and I have no idea what are all of those possible values and how they are expected to be used. Each time I work on this code I need to grep some code to understand how they are used.

Yeah, my goal with the new implementation is also reducing the things we inject into the request, and pass them explicitly where needed if possible, but if we still require them using an object would be another good option.

Yea, it would be nice to standardize the data we're setting on the request. There's a few random places we're reading that data, so we need to be careful that we don't randomly break something downstream though.

ericholscher

This looks like a good start to a refactor. 👍

ericholscher · 2022-08-17T23:33:49Z

readthedocs/proxito/middleware.py

+    subdomain, *rest_of_domain = domain.split(".", maxsplit=1)
+    rest_of_domain = rest_of_domain[0] if rest_of_domain else ""


This logic feels brittle, but probably better than what we had. I wonder if there's a more explicit way to do this like urlparse, but I know we've also hit issues with urlparse doing weird things when you don't pass it a fully qualified URL.

This should probably also be more explicitly named:

Suggested change

subdomain, *rest_of_domain = domain.split(".", maxsplit=1)

rest_of_domain = rest_of_domain[0] if rest_of_domain else ""

subdomain, *rest_of_domain = domain.split(".", maxsplit=1)

root_domain = rest_of_domain[0] if rest_of_domain else ""

ericholscher · 2022-08-17T23:34:42Z

readthedocs/proxito/middleware.py

+    Unresolve domain by extracting relevant information from it.
+
+    :param str domain: Domain to extract the information from.
+    :returns: A tuple with: the project slug, domain object, and if the domain


This feels like an odd set of things to return.. Probably better to have an object here or something? 3 random values is hard to read in the code. But not a huge deal since it's an internal function I guess.

ericholscher · 2022-08-17T23:37:05Z

readthedocs/proxito/middleware.py

+        return None, None, False
+
+    if external_domain in domain:
+        # Serve custom versions on external-host-domain.


Not sure what custom versions means here. This is PR builds right?

Suggested change

# Serve custom versions on external-host-domain.

# Serve PR builds on external_domain host.

ericholscher · 2022-08-17T23:37:40Z

readthedocs/proxito/middleware.py

+
+    if external_domain in domain:
+        # Serve custom versions on external-host-domain.
+        if external_domain == rest_of_domain:


Seems these 2 if statements can be on the same if. Should we be doing something in the else here?

We could return None here as we do with normal serving (for domains that look like external domains, i.e. phishing)

ericholscher · 2022-08-17T23:42:38Z

readthedocs/proxito/middleware.py

        request.cname = True
-        request.domain = domain
+        request.domain = domain_object


Yea, it would be nice to standardize the data we're setting on the request. There's a few random places we're reading that data, so we need to be careful that we don't randomly break something downstream though.

stsewd added 5 commits August 1, 2022 10:38

Move code

e34fe11

Remove request

6ac26b0

Return tuple

a732260

Replace

0672724

Refactor

185771d

stsewd requested a review from a team as a code owner August 1, 2022 17:17

stsewd requested a review from humitos August 1, 2022 17:17

auto-assign bot assigned stsewd Aug 1, 2022

stsewd changed the title ~~Proxito: separate project slug extraction from request~~ Proxito: separate project slug extraction from request manipulation Aug 1, 2022

humitos requested a review from ericholscher August 9, 2022 11:02

humitos approved these changes Aug 9, 2022

View reviewed changes

stsewd added 2 commits August 9, 2022 11:09

Updates from review

3d30d6b

Merge branch 'main' into refactor-middleware

fce39ac

stsewd mentioned this pull request Aug 15, 2022

New unresolver implementation #9500

Merged

ericholscher approved these changes Aug 17, 2022

View reviewed changes

stsewd added 3 commits August 17, 2022 19:36

Updates from review

139204f

Merge branch 'main' into refactor-middleware

ecb67d1

Missed some logs

9094176

stsewd merged commit 1ef21db into main Aug 18, 2022

stsewd deleted the refactor-middleware branch August 18, 2022 15:36

ericholscher mentioned this pull request Jan 17, 2023

Proxito: Next steps on Improve URL Handling #9911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxito: separate project slug extraction from request manipulation #9462

Proxito: separate project slug extraction from request manipulation #9462

stsewd commented Aug 1, 2022 •

edited by github-actions bot

Loading

humitos left a comment

humitos Aug 9, 2022

stsewd Aug 9, 2022

humitos Aug 9, 2022

stsewd Aug 9, 2022

ericholscher Aug 18, 2022 •

edited

Loading

humitos Aug 9, 2022

stsewd Aug 9, 2022

ericholscher Aug 17, 2022

ericholscher left a comment

ericholscher Aug 17, 2022 •

edited

Loading

ericholscher Aug 17, 2022 •

edited

Loading

ericholscher Aug 17, 2022

ericholscher Aug 17, 2022

stsewd Aug 18, 2022

ericholscher Aug 17, 2022

		:returns: A tuple with the project slug, domain object, and if the domain
		is external.

		return host.lower().split(":")[0]


		def _unresolve_domain(domain):

		subdomain, *rest_of_domain = domain.split(".", maxsplit=1)
		rest_of_domain = rest_of_domain[0] if rest_of_domain else ""

	# Serve custom versions on external-host-domain.
	# Serve PR builds on external_domain host.

Proxito: separate project slug extraction from request manipulation #9462

Proxito: separate project slug extraction from request manipulation #9462

Conversation

stsewd commented Aug 1, 2022 • edited by github-actions bot Loading

humitos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericholscher Aug 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericholscher left a comment

Choose a reason for hiding this comment

ericholscher Aug 17, 2022 • edited Loading

Choose a reason for hiding this comment

ericholscher Aug 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stsewd commented Aug 1, 2022 •

edited by github-actions bot

Loading

ericholscher Aug 18, 2022 •

edited

Loading

ericholscher Aug 17, 2022 •

edited

Loading

ericholscher Aug 17, 2022 •

edited

Loading