-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Proxito: separate project slug extraction from request manipulation #9462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! 👍🏼 Thanks for splitting the PR into different commits.
I just ping @ericholscher to take a look as well since he was involved into this.
readthedocs/proxito/middleware.py
Outdated
:returns: A tuple with the project slug, domain object, and if the domain | ||
is external. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to use one of those nice dataclasses that you've used in other places here as well? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started using dataclases, but then it felt a little too much for this case, but I may use a dataclass when moving this to the unversolver code.
return host.lower().split(":")[0] | ||
|
||
|
||
def _unresolve_domain(domain): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good if this function logs "where did we get the project slug from?". Was it from a subdomain of public domain? a custom domain? a subdomain of the external domain?
It could be in info mode for now while we are working on this refactor, so we can check that things are working correctly in production. Then, we can lower it down to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are also logged outside the function
log.debug('Proxito CNAME.', host=host) |
If you think this makes more sense inside the function, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move the logic into this function, since this function will also be called in other places other than proxito after it's moved to the unresolver, right?
request.cname = True | ||
request.domain = domain | ||
request.domain = domain_object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to do what I'm going to propose in this PR, but, can we avoid passing things by injecting things in the request
? In case this is not possible, or it's our best option, can we define a specific well-documented object?
request.readthedocs = ReadTheDocsRequest(
cname=bool,
domain=Domain,
external_domain=bool,
host_version_slug=str,
canonicalize=bool,
subdomain=bool,
)
I think we have this kind of injections all over our code and I have no idea what are all of those possible values and how they are expected to be used. Each time I work on this code I need to grep some code to understand how they are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my goal with the new implementation is also reducing the things we inject into the request, and pass them explicitly where needed if possible, but if we still require them using an object would be another good option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it would be nice to standardize the data we're setting on the request. There's a few random places we're reading that data, so we need to be careful that we don't randomly break something downstream though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start to a refactor. 👍
readthedocs/proxito/middleware.py
Outdated
subdomain, *rest_of_domain = domain.split(".", maxsplit=1) | ||
rest_of_domain = rest_of_domain[0] if rest_of_domain else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic feels brittle, but probably better than what we had. I wonder if there's a more explicit way to do this like urlparse, but I know we've also hit issues with urlparse doing weird things when you don't pass it a fully qualified URL.
This should probably also be more explicitly named:
subdomain, *rest_of_domain = domain.split(".", maxsplit=1) | |
rest_of_domain = rest_of_domain[0] if rest_of_domain else "" | |
subdomain, *rest_of_domain = domain.split(".", maxsplit=1) | |
root_domain = rest_of_domain[0] if rest_of_domain else "" |
Unresolve domain by extracting relevant information from it. | ||
|
||
:param str domain: Domain to extract the information from. | ||
:returns: A tuple with: the project slug, domain object, and if the domain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like an odd set of things to return.. Probably better to have an object here or something? 3 random values is hard to read in the code. But not a huge deal since it's an internal function I guess.
readthedocs/proxito/middleware.py
Outdated
return None, None, False | ||
|
||
if external_domain in domain: | ||
# Serve custom versions on external-host-domain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what custom versions
means here. This is PR builds right?
# Serve custom versions on external-host-domain. | |
# Serve PR builds on external_domain host. |
readthedocs/proxito/middleware.py
Outdated
|
||
if external_domain in domain: | ||
# Serve custom versions on external-host-domain. | ||
if external_domain == rest_of_domain: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems these 2 if statements can be on the same if. Should we be doing something in the else
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could return None here as we do with normal serving (for domains that look like external domains, i.e. phishing)
request.cname = True | ||
request.domain = domain | ||
request.domain = domain_object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it would be nice to standardize the data we're setting on the request. There's a few random places we're reading that data, so we need to be careful that we don't randomly break something downstream though.
This is so we can use the project slug extraction in the new implementation of the unresolver. The refactor is split into several commits.
📚 Documentation preview 📚: https://docs--9462.org.readthedocs.build/en/9462/