-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Proxito: separate project slug extraction from request manipulation #9462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
e34fe11
6ac26b0
a732260
0672724
185771d
3d30d6b
fce39ac
139204f
ecb67d1
9094176
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
@@ -22,6 +22,59 @@ | |||||||||
log = structlog.get_logger(__name__) # noqa | ||||||||||
|
||||||||||
|
||||||||||
def _get_domain_from_host(host): | ||||||||||
""" | ||||||||||
Get the normalized domain from a hostname. | ||||||||||
|
||||||||||
A hostname can include the port. | ||||||||||
""" | ||||||||||
return host.lower().split(":")[0] | ||||||||||
|
||||||||||
|
||||||||||
def _unresolve_domain(domain): | ||||||||||
""" | ||||||||||
Unresolve domain. | ||||||||||
stsewd marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
:param str domain: Domain to extrac the project slug from. | ||||||||||
stsewd marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
:returns: A tuple with the project slug, domain object, and if the domain | ||||||||||
is external. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it make sense to use one of those nice dataclasses that you've used in other places here as well? 😄 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I started using dataclases, but then it felt a little too much for this case, but I may use a dataclass when moving this to the unversolver code.
stsewd marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
""" | ||||||||||
public_domain = _get_domain_from_host(settings.PUBLIC_DOMAIN) | ||||||||||
external_domain = _get_domain_from_host(settings.RTD_EXTERNAL_VERSION_DOMAIN) | ||||||||||
|
||||||||||
subdomain, *rest_of_domain = domain.split(".", maxsplit=1) | ||||||||||
rest_of_domain = rest_of_domain[0] if rest_of_domain else "" | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This logic feels brittle, but probably better than what we had. I wonder if there's a more explicit way to do this like urlparse, but I know we've also hit issues with urlparse doing weird things when you don't pass it a fully qualified URL. This should probably also be more explicitly named:
Suggested change
|
||||||||||
|
||||||||||
if public_domain in domain: | ||||||||||
# Serve from the PUBLIC_DOMAIN, ensuring it looks like `foo.PUBLIC_DOMAIN`. | ||||||||||
if public_domain == rest_of_domain: | ||||||||||
project_slug = subdomain | ||||||||||
return project_slug, None, False | ||||||||||
|
||||||||||
# TODO: This can catch some possibly valid domains (docs.readthedocs.io.com) for example, | ||||||||||
# but these might be phishing, so let's ignore them for now. | ||||||||||
ericholscher marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
return None, None, False | ||||||||||
|
||||||||||
if external_domain in domain: | ||||||||||
# Serve custom versions on external-host-domain. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what
Suggested change
|
||||||||||
if external_domain == rest_of_domain: | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems these 2 if statements can be on the same if. Should we be doing something in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could return None here as we do with normal serving (for domains that look like external domains, i.e. phishing) |
||||||||||
try: | ||||||||||
project_slug, _ = subdomain.rsplit("--", maxsplit=1) | ||||||||||
return project_slug, None, True | ||||||||||
except ValueError: | ||||||||||
return None, None, False | ||||||||||
|
||||||||||
# Custom domain. | ||||||||||
domain_object = ( | ||||||||||
Domain.objects.filter(domain=domain).prefetch_related("project").first() | ||||||||||
) | ||||||||||
if domain_object: | ||||||||||
project_slug = domain_object.project.slug | ||||||||||
return project_slug, domain_object, False | ||||||||||
|
||||||||||
return None, None, None | ||||||||||
|
||||||||||
|
||||||||||
def map_host_to_project_slug(request): # pylint: disable=too-many-return-statements | ||||||||||
""" | ||||||||||
Take the request and map the host to the proper project slug. | ||||||||||
|
@@ -38,76 +91,41 @@ def map_host_to_project_slug(request): # pylint: disable=too-many-return-statem | |||||||||
- This sets ``request.canonicalize`` with the value as the reason | ||||||||||
""" | ||||||||||
|
||||||||||
host = request.get_host().lower().split(':')[0] | ||||||||||
public_domain = settings.PUBLIC_DOMAIN.lower().split(':')[0] | ||||||||||
external_domain = settings.RTD_EXTERNAL_VERSION_DOMAIN.lower().split(':')[0] | ||||||||||
|
||||||||||
host_parts = host.split('.') | ||||||||||
public_domain_parts = public_domain.split('.') | ||||||||||
external_domain_parts = external_domain.split('.') | ||||||||||
|
||||||||||
project_slug = None | ||||||||||
host = _get_domain_from_host(request.get_host()) | ||||||||||
public_domain = _get_domain_from_host(settings.PUBLIC_DOMAIN) | ||||||||||
external_domain = _get_domain_from_host(settings.RTD_EXTERNAL_VERSION_DOMAIN) | ||||||||||
|
||||||||||
# Explicit Project slug being passed in | ||||||||||
# Explicit Project slug being passed in. | ||||||||||
if 'HTTP_X_RTD_SLUG' in request.META: | ||||||||||
project_slug = request.META['HTTP_X_RTD_SLUG'].lower() | ||||||||||
if Project.objects.filter(slug=project_slug).exists(): | ||||||||||
request.rtdheader = True | ||||||||||
log.info('Setting project based on X_RTD_SLUG header.', project_slug=project_slug) | ||||||||||
return project_slug | ||||||||||
|
||||||||||
if public_domain in host or host == 'proxito': | ||||||||||
# Serve from the PUBLIC_DOMAIN, ensuring it looks like `foo.PUBLIC_DOMAIN` | ||||||||||
if public_domain_parts == host_parts[1:]: | ||||||||||
project_slug = host_parts[0] | ||||||||||
request.subdomain = True | ||||||||||
log.debug('Proxito Public Domain.', host=host) | ||||||||||
if Domain.objects.filter(project__slug=project_slug).filter( | ||||||||||
canonical=True, | ||||||||||
https=True, | ||||||||||
).exists(): | ||||||||||
log.debug('Proxito Public Domain -> Canonical Domain Redirect.', host=host) | ||||||||||
request.canonicalize = constants.REDIRECT_CANONICAL_CNAME | ||||||||||
elif ( | ||||||||||
ProjectRelationship.objects. | ||||||||||
filter(child__slug=project_slug).exists() | ||||||||||
): | ||||||||||
log.debug('Proxito Public Domain -> Subproject Main Domain Redirect.', host=host) | ||||||||||
request.canonicalize = constants.REDIRECT_SUBPROJECT_MAIN_DOMAIN | ||||||||||
return project_slug | ||||||||||
|
||||||||||
# TODO: This can catch some possibly valid domains (docs.readthedocs.io.com) for example | ||||||||||
# But these feel like they might be phishing, etc. so let's block them for now. | ||||||||||
log.warning('Weird variation on our hostname.', host=host) | ||||||||||
return render( | ||||||||||
request, 'core/dns-404.html', context={'host': host}, status=400 | ||||||||||
) | ||||||||||
|
||||||||||
if external_domain in host: | ||||||||||
# Serve custom versions on external-host-domain | ||||||||||
if external_domain_parts == host_parts[1:]: | ||||||||||
try: | ||||||||||
project_slug, version_slug = host_parts[0].rsplit('--', 1) | ||||||||||
request.external_domain = True | ||||||||||
request.host_version_slug = version_slug | ||||||||||
log.debug('Proxito External Version Domain.', host=host) | ||||||||||
return project_slug | ||||||||||
except ValueError: | ||||||||||
log.warning('Weird variation on our hostname.', host=host) | ||||||||||
return render( | ||||||||||
request, 'core/dns-404.html', context={'host': host}, status=400 | ||||||||||
) | ||||||||||
project_slug, domain_object, external = _unresolve_domain(host) | ||||||||||
if not project_slug: | ||||||||||
# Block domains that look like ours, may be phishing. | ||||||||||
if external_domain in host or public_domain in host: | ||||||||||
log.warning("Weird variation on our hostname.", host=host) | ||||||||||
return render( | ||||||||||
request, | ||||||||||
"core/dns-404.html", | ||||||||||
context={"host": host}, | ||||||||||
status=400, | ||||||||||
) | ||||||||||
# Some person is CNAMEing to us without configuring a domain - 404. | ||||||||||
log.debug("CNAME 404.", host=host) | ||||||||||
return render(request, "core/dns-404.html", context={"host": host}, status=404) | ||||||||||
|
||||||||||
# Serve CNAMEs | ||||||||||
domain = Domain.objects.filter(domain=host).first() | ||||||||||
if domain: | ||||||||||
project_slug = domain.project.slug | ||||||||||
# Custom domain. | ||||||||||
if domain_object: | ||||||||||
request.cname = True | ||||||||||
request.domain = domain | ||||||||||
request.domain = domain_object | ||||||||||
Comment on lines
128
to
+129
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need to do what I'm going to propose in this PR, but, can we avoid passing things by injecting things in the request.readthedocs = ReadTheDocsRequest(
cname=bool,
domain=Domain,
external_domain=bool,
host_version_slug=str,
canonicalize=bool,
subdomain=bool,
) I think we have this kind of injections all over our code and I have no idea what are all of those possible values and how they are expected to be used. Each time I work on this code I need to grep some code to understand how they are used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, my goal with the new implementation is also reducing the things we inject into the request, and pass them explicitly where needed if possible, but if we still require them using an object would be another good option. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, it would be nice to standardize the data we're setting on the request. There's a few random places we're reading that data, so we need to be careful that we don't randomly break something downstream though. |
||||||||||
log.debug('Proxito CNAME.', host=host) | ||||||||||
|
||||||||||
if domain.https and not request.is_secure(): | ||||||||||
# Redirect HTTP -> HTTPS (302) for this custom domain | ||||||||||
if domain_object.https and not request.is_secure(): | ||||||||||
# Redirect HTTP -> HTTPS (302) for this custom domain. | ||||||||||
log.debug('Proxito CNAME HTTPS Redirect.', host=host) | ||||||||||
request.canonicalize = constants.REDIRECT_HTTPS | ||||||||||
|
||||||||||
|
@@ -116,11 +134,34 @@ def map_host_to_project_slug(request): # pylint: disable=too-many-return-statem | |||||||||
|
||||||||||
return project_slug | ||||||||||
|
||||||||||
# Some person is CNAMEing to us without configuring a domain - 404. | ||||||||||
log.debug('CNAME 404.', host=host) | ||||||||||
return render( | ||||||||||
request, 'core/dns-404.html', context={'host': host}, status=404 | ||||||||||
) | ||||||||||
# Pull request previews. | ||||||||||
if external: | ||||||||||
subdomain, _ = host.split(".", maxsplit=1) | ||||||||||
_, version_slug = subdomain.rsplit("--", maxsplit=1) | ||||||||||
request.external_domain = True | ||||||||||
request.host_version_slug = version_slug | ||||||||||
log.debug("Proxito External Version Domain.", host=host) | ||||||||||
return project_slug | ||||||||||
|
||||||||||
# Normal doc serving. | ||||||||||
request.subdomain = True | ||||||||||
log.debug("Proxito Public Domain.", host=host) | ||||||||||
if ( | ||||||||||
Domain.objects.filter(project__slug=project_slug) | ||||||||||
.filter( | ||||||||||
canonical=True, | ||||||||||
https=True, | ||||||||||
) | ||||||||||
.exists() | ||||||||||
): | ||||||||||
log.debug("Proxito Public Domain -> Canonical Domain Redirect.", host=host) | ||||||||||
request.canonicalize = constants.REDIRECT_CANONICAL_CNAME | ||||||||||
elif ProjectRelationship.objects.filter(child__slug=project_slug).exists(): | ||||||||||
log.debug( | ||||||||||
"Proxito Public Domain -> Subproject Main Domain Redirect.", host=host | ||||||||||
) | ||||||||||
request.canonicalize = constants.REDIRECT_SUBPROJECT_MAIN_DOMAIN | ||||||||||
return project_slug | ||||||||||
|
||||||||||
|
||||||||||
class ProxitoMiddleware(MiddlewareMixin): | ||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good if this function logs "where did we get the project slug from?". Was it from a subdomain of public domain? a custom domain? a subdomain of the external domain?
It could be in info mode for now while we are working on this refactor, so we can check that things are working correctly in production. Then, we can lower it down to debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are also logged outside the function
readthedocs.org/readthedocs/proxito/middleware.py
Line 125 in 185771d
If you think this makes more sense inside the function, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move the logic into this function, since this function will also be called in other places other than proxito after it's moved to the unresolver, right?