-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Build: replace GitPython with git commands #10594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@dataclass(slots=True) | ||
class GitSubmodule: | ||
path: str | ||
url: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the path and URL of each submodule, since we validate that each URL has a valid/supported schema, we could remove this check if we want, it would simplify the code a little :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we would just need to call Git.submodules
when the user is excluding submodules only, when including some or all, we can just call the git submodule update
with an empty list or the list of given submodules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if the user provides submodules that aren't valid, well, git will just take care of showing the correct error message, right now we are just excluding invalid submodules from the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a preference here?
I don't remember why we were checking if the URL has a supported scheme. Maybe that will help me to have a more informed opinion 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agjohnson may have some input here. From what I see, this was done, so users wouldn't use private submodules (SSH) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, I seem to recall some issues with repositories defining bogus submodules, or having old and unused submodules defined. I don't think this was to limit the URL pattern to supported protocols.
It could have also been a security measure, as it was probably possible to define a submodule like /etc/
or something and clone from something outside the container (when we were performing submodule operations outside containers).
Any of this seem familiar or likely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stealing internal repos could have been a possibility before moving the execution to docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds familiar to me the validation of the submodules, but this PR should fix that automatically because we trust git itself now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm thinking if any of the above are true, it was probably a throw back to when we weren't executing the submodule steps inside Docker. In any case that would seem this logic can be removed, but if anyone is hesitant about it, I'm not saying this is guaranteed to be safe either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I'd say this is up to you @stsewd since you are the security master here 😄 and the one that has been working on this chunk of code.
return bool(self.submodules) | ||
# TODO: remove when all projects are required | ||
# to have a config file. | ||
return any(self.submodules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This results in 2 calls to git config
for projects that don't have a config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good 👍🏼 . I have a few questions and suggestions to continue talking about.
However, the most important thing I want to ask here is "Have you tested this with GIT_CLONE_FETCH_CHECKOUT_PATTERN
feature flag?". We have that enabled by default to all projects on production and we are not using the old pattern anymore.
except (TypeError, AttributeError): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to do log.warning
here, so we can easily realize if there is a problem from NR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a log here, but we will probably see a spike in logs, since we weren't logging these before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we have lot of command's outputs that are not possible to decode? That shouldn't be normal, I'd say.
valid, submodules = repo.validate_submodules(self.dummy_conf) | ||
self.assertTrue(valid) | ||
self.assertEqual( | ||
list(submodules), ["foobar", "path with spaces", "another-submodule"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the foobar
submodule comes from? I don't see it on the file written in disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's defined in the submodule
branch of the test git repo.
@dataclass(slots=True) | ||
class GitSubmodule: | ||
path: str | ||
url: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a preference here?
I don't remember why we were checking if the URL has a supported scheme. Maybe that will help me to have a more informed opinion 😄
Oh, one thing that I missed here. You should remove |
We still use it in the code that gets tags and branches. |
yeah, I did. |
I understood this PR was meant to use |
Looks like that's still under a feature flag/logic readthedocs.org/readthedocs/projects/tasks/mixins.py Lines 52 to 55 in 5f14e95
Looks like #8968 is removing that logic. |
cc @agjohnson in case you have some thoughts on https://github.com/readthedocs/readthedocs.org/pull/10594/files#r1282426806 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added two code comments. Other than that, we are ready to merge it I'd say. We can come back and refactor the other small pieces we are talking in the comments if we want to.
I'm merging this so I can move forward with other PRs that depend on this one. |
We are always querying for submodules, even when the user explicitly listed the submodules to include. We also have the option to include all submodules, instead of listing each one of them, we can just omit them, git will automatically default to include all of them. Ref #10594 (comment)
We are always querying for submodules, even when the user explicitly listed the submodules to include. We also have the option to include all submodules, instead of listing each one of them, we can just omit them, git will automatically default to include all of them. Ref #10594 (comment)
To unambiguously parse the submodules, we need to use the
--null
option, that results on each item being separated by a null character, all good, except that we are "sanitizing" the output of all commands, stripping those null characters. We are doing this to avoid problems when saving this in the DB, so instead of "sanitizing" the output after the command is run, I'm moving that logic to just when it's needed, this is when saving the command using the API.Closes https://github.com/readthedocs/readthedocs-corporate/issues/1524
Closes #8990