Description
Hi,
I recently ran into a race condition when using two different instances of git.Repo
objects in two different threads.
Inside repo.index.add()
, some functions are decorated with the git_working_dir
decorator. This decorator changes the process working directory to the one of repository, performs some system calls, then returns the working dir back to the initial value.
If two threads perform repo.index.add()
, even on two independent Repo()
objects, the working directory of the entire process could be permanently modified. Moreover, system calls done by index.add()
may fail. For example lstat()
will be executed with a relative file name assuming a wrong absolute path.
For example:
- working dir (wd) is /home/user
- git_working_dir is entered in thread 1. (/home/user is registered, wd changed to /repo1)
- git_working_dir is entered in thread 2. (/repo1 is registered, wd changed to /repo2)
- git_working_dir is exited in thread 1. (wd restored to /home/user)
- git_working_dir is exited in thread 2. (wd restored to /repo1)
now the process wd is /repo1
I believe I could work around this issue by creating git object directly rather than adding to the index then committing, but I thought you may be interested to know about the issue.
Radu
Activity
Byron commentedon Sep 2, 2020
Thanks for the detailed report. I am leaving it open for others to find, despite having no hopes of this being tackled.
Generally, GitPython is not thread-safe, but if desired, certain types of actions cold be made to allow multiple threads under the condition that these operate on their own repositories.
rcarpa commentedon Sep 2, 2020
Seems like the mentioned decorator is only used to protect calls to _store_path. Would it be difficult to ensure that this function does system calls only on absolute paths and performs conversion to relative names (to add to git) by path manipulation (prefix stripping)?
Curious to know why it is complicated.
Byron commentedon Sep 2, 2020
That should definitely be possible. The reason it is currently implemented like this is simplicity: delegating all path handling to the operating system (the devil is in the details especially when windows is involved).
rcarpa commentedon Sep 2, 2020
Ah ... didn't think about multi-platforming at all :( . Thanks for the explanation !