Description
For f-droid.org, we build thousands of Android apps from git repos. To reduce our attack surface and work towards "least authority", we use a custom Git wrapper that locks down a lot of things that we never need, and have a higher risk of vulnerabilities. I would like to rework this to be a part of GitPython. So I'm opening this issue to see if this is something that the GitPython maintainers would be interested in merging.
I'm open on the API, it could be something like this:
git_repo = git.repo.Repo('.', safe=True)
The goal would be then that all invocations of Git would include these kinds options:
core.askpass = /bin/true
core.hooksPath = /dev/null
core.sshCommand = /bin/true
credential.helper = /bin/true
http.emptyAuth = true
protocol.allow = never
protocol.https.allow = always
url.https://.insteadOf = ssh://
And run with these env vars:
GIT_TERMINAL_PROMPT=0
GIT_ASKPASS=/bin/true
SSH_ASKPASS=/bin/true
GIT_SSH=/bin/true # for git < 2.3
This then hopefully only allows unauthenticated access to HTTPS repos, and prevents the execution of any command besides git
. This would eliminate risks like these:
- GHSA-2mqj-m65w-jghx
- https://stackoverflow.com/questions/74200395/is-it-dangerous-to-open-or-clone-a-git-repository-from-an-untrusted-source
- GHSA-vm9j-46j9-qvq4
- https://git-scm.com/docs/git#_security
- https://github.blog/open-source/git/securing-git-addressing-5-new-vulnerabilities/
- https://nvd.nist.gov/vuln/detail/CVE-2017-1000117
Activity
Byron commentedon Mar 31, 2025
This sounds like a very nice feature to have, and one that wouldn't affect anyone who didn't opt in to.
Yes, a PR would definitely be welcome.
Kilo59 commentedon Apr 2, 2025
Related question, is there a way to set the equivalent of
GIT_TERMINAL_PROMPT=0
purely withingitpython
(without setting an env var)?eighthave commentedon Apr 2, 2025
GIT_TERMINAL_PROMPT=0
is functionally equivalent tocore.askpass = /bin/true
as far as I understand it. The env vars can override the config values, so I want to set the env vars to make sure that the config values are never overridden.There is another thing that we use which is hard to generalize. The core goal is to rewrite all remote URLs to
https://
, whenever possible, then only supporthttps
as a protocol. This is necessary to support submodules when using this "safe" mode.ssh://
URLs are easy to handle (url.https://.insteadOf = ssh://
), but the rest are not. This is the best I could come up with:Anyone know a way to generalize those
insteadOf
rules to any domain?Byron commentedon Apr 3, 2025
This seems to be the code handling the rewrites in Git - maybe from there it becomes clear how it can be used more generally?
Maybe it's the the code that parses the configuration that limits what it can do though.
eighthave commentedon May 26, 2025
I've started implementing this finally. The first thing I'm looking at is a way to read commit IDs (e.g.
repo.head.commit.binsha
) while being utterly certain thatgit
is never executed.git.refs.symbolic.SymbolicReference.dereference_recursive()
provides this already, with some extra work. Right now, in fdroidserver, we're using our own function based ondereference_recursive()
. Mayberepo.head.commit.binsha
already avoid executinggit
, but we need some kind of guarantee. Any tips on a better approach? Like is there any internal API that stopsgit
from being executed?Byron commentedon May 26, 2025
It's very likely that using
gitoxide
would be preferred here. It has its own security model which makes the use of untrusted configuration impossible, and I'd be inclined to say that it can probably do what you'd need it to.Probably that's not an option there though, and I suppose there can be other means to assure GitPython can't do things it shouldn't do with untrusted repositories. Testing this properly will certainly need some consideration as well.
eighthave commentedon May 26, 2025
gitoxide
sounds great, but fdroidserver is in Python, and using pure Python makes a lot of distribution issues much easier, hence GitPython. I think GitPython will only need some small tweaks to do what we need it too.eighthave commentedon May 26, 2025
#2029 implements the approach that I think works best. I'm open to suggestions from the GitPython experts.
eighthave commentedon May 26, 2025
The tricky case is
url.https://{domain}/.insteadOf=git@{domain}:
, e.g. rewritinggit@github.com:illarionov/OsmDroid.git
tohttps://github.com/illarionov/OsmDroid.git
. It has to replace bothgit@
withhttps://
and:
with/
, but the domain is in between. TheinsteadOf
string replacement is like astartswith()
, so the only way to replace that:
is by including the domain name.So there would need to be a way to parse the URLs first from .gitmodules to modify them. That seems complicated. Otherwise, just a few statically defined domain names would cover the vast majority of cases:
Byron commentedon May 27, 2025
That sounds like a start. More importantly, I think the weaknesses of the
safe
parameter should be very well documented.eighthave commentedon Jun 4, 2025
Here is my attempt at documenting the setup and known weaknesses:
https://github.com/gitpython-developers/GitPython/pull/2029/files#diff-35a18a749eb4d6efad45e56e78a9554926be5526e2ba2159b44311e718450e88R957
receive.procReceiveRefs
anduploadpack.packObjectsHook
do. Is there anything I should add about them?remote.<name>.vcs
does, but I don't know what kinds of risks it opens up. I did some quick tests and it seems thatremote.<name>.vcs
is blocked byprotocol.allow=never
but I don't know if that's guaranteed.eighthave commentedon Jun 4, 2025
Looks like the only thing of concern would be a malicious URL that exploits
git-remote-https
. I wonder if it would be worthwhile to sanitize the URL first?