Skip to content

Proxito: separate project slug extraction from request manipulation #9462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 18, 2022

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Aug 1, 2022

This is so we can use the project slug extraction in the new implementation of the unresolver. The refactor is split into several commits.


📚 Documentation preview 📚: https://docs--9462.org.readthedocs.build/en/9462/

@stsewd stsewd requested a review from a team as a code owner August 1, 2022 17:17
@stsewd stsewd requested a review from humitos August 1, 2022 17:17
@stsewd stsewd changed the title Proxito: separate project slug extraction from request Proxito: separate project slug extraction from request manipulation Aug 1, 2022
@humitos humitos requested a review from ericholscher August 9, 2022 11:02
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 👍🏼 Thanks for splitting the PR into different commits.

I just ping @ericholscher to take a look as well since he was involved into this.

Comment on lines 39 to 40
:returns: A tuple with the project slug, domain object, and if the domain
is external.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to use one of those nice dataclasses that you've used in other places here as well? 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started using dataclases, but then it felt a little too much for this case, but I may use a dataclass when moving this to the unversolver code.

return host.lower().split(":")[0]


def _unresolve_domain(domain):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if this function logs "where did we get the project slug from?". Was it from a subdomain of public domain? a custom domain? a subdomain of the external domain?

It could be in info mode for now while we are working on this refactor, so we can check that things are working correctly in production. Then, we can lower it down to debug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are also logged outside the function

log.debug('Proxito CNAME.', host=host)
, similar to #9462 (comment) I wasn't sure if we should logs this outside or inside the function, also, they are set as debug as the old logs.

If you think this makes more sense inside the function, let me know.

Copy link
Member

@ericholscher ericholscher Aug 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move the logic into this function, since this function will also be called in other places other than proxito after it's moved to the unresolver, right?

Comment on lines 123 to +124
request.cname = True
request.domain = domain
request.domain = domain_object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do what I'm going to propose in this PR, but, can we avoid passing things by injecting things in the request? In case this is not possible, or it's our best option, can we define a specific well-documented object?

request.readthedocs = ReadTheDocsRequest(
  cname=bool,
  domain=Domain,
  external_domain=bool,
  host_version_slug=str,
  canonicalize=bool,
  subdomain=bool,
)

I think we have this kind of injections all over our code and I have no idea what are all of those possible values and how they are expected to be used. Each time I work on this code I need to grep some code to understand how they are used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my goal with the new implementation is also reducing the things we inject into the request, and pass them explicitly where needed if possible, but if we still require them using an object would be another good option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it would be nice to standardize the data we're setting on the request. There's a few random places we're reading that data, so we need to be careful that we don't randomly break something downstream though.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start to a refactor. 👍

Comment on lines 45 to 46
subdomain, *rest_of_domain = domain.split(".", maxsplit=1)
rest_of_domain = rest_of_domain[0] if rest_of_domain else ""
Copy link
Member

@ericholscher ericholscher Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic feels brittle, but probably better than what we had. I wonder if there's a more explicit way to do this like urlparse, but I know we've also hit issues with urlparse doing weird things when you don't pass it a fully qualified URL.

This should probably also be more explicitly named:

Suggested change
subdomain, *rest_of_domain = domain.split(".", maxsplit=1)
rest_of_domain = rest_of_domain[0] if rest_of_domain else ""
subdomain, *rest_of_domain = domain.split(".", maxsplit=1)
root_domain = rest_of_domain[0] if rest_of_domain else ""

Unresolve domain by extracting relevant information from it.

:param str domain: Domain to extract the information from.
:returns: A tuple with: the project slug, domain object, and if the domain
Copy link
Member

@ericholscher ericholscher Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like an odd set of things to return.. Probably better to have an object here or something? 3 random values is hard to read in the code. But not a huge deal since it's an internal function I guess.

return None, None, False

if external_domain in domain:
# Serve custom versions on external-host-domain.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what custom versions means here. This is PR builds right?

Suggested change
# Serve custom versions on external-host-domain.
# Serve PR builds on external_domain host.


if external_domain in domain:
# Serve custom versions on external-host-domain.
if external_domain == rest_of_domain:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems these 2 if statements can be on the same if. Should we be doing something in the else here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could return None here as we do with normal serving (for domains that look like external domains, i.e. phishing)

Comment on lines 123 to +124
request.cname = True
request.domain = domain
request.domain = domain_object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it would be nice to standardize the data we're setting on the request. There's a few random places we're reading that data, so we need to be careful that we don't randomly break something downstream though.

@stsewd stsewd merged commit 1ef21db into main Aug 18, 2022
@stsewd stsewd deleted the refactor-middleware branch August 18, 2022 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants