Skip to content

Possible performance regression in GitPython 3.0.0 #906

Closed
@jeblair

Description

@jeblair
Contributor

I have observed that the test run time on the Zuul project, which makes heavy use of GitPython, has increased by a factor of 1.5 (30 minutes under 2.1.13 to 45 minutes under 3.0.0).

Activity

jeblair

jeblair commented on Aug 12, 2019

@jeblair
ContributorAuthor

It looks like the problem may be that many more GitPython actions now result in a call to "git rev-parse" to find the config file location.

I constructed a test script based on one of our unit tests and see the following commands under 2.1.13:

DEBUG:git.cmd:Popen(['git', 'init'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'cat-file', '--batch-check'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=<valid stream>)
DEBUG:git.cmd:Popen(['git', 'cat-file', '--batch'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=<valid stream>)
DEBUG:git.cmd:Popen(['git', 'reset', '--hard', 'HEAD', '--'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'clean', '-x', '-f', '-d'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'clone', '/tmp/upstream', '/tmp/downstream'], cwd=/home/corvus/git/zuul/zuul, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'remote', 'prune', '--dry-run', 'origin'], cwd=/tmp/downstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'cat-file', '--batch-check'], cwd=/tmp/downstream, universal_newlines=False, shell=None, istream=<valid stream>)

And the same under 3.0.0:

DEBUG:git.cmd:Popen(['git', 'init'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'rev-parse', '--git-path', 'config'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'rev-parse', '--git-path', 'config'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'cat-file', '--batch-check'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=<valid stream>)
DEBUG:git.cmd:Popen(['git', 'cat-file', '--batch'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=<valid stream>)
DEBUG:git.cmd:Popen(['git', 'reset', '--hard', 'HEAD', '--'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'clean', '-x', '-f', '-d'], cwd=/tmp/upstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'clone', '/tmp/upstream', '/tmp/downstream'], cwd=/home/corvus/git/zuul/zuul, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'rev-parse', '--git-path', 'config'], cwd=/tmp/downstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'rev-parse', '--git-path', 'config'], cwd=/tmp/downstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'rev-parse', '--git-path', 'config'], cwd=/tmp/downstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'remote', 'prune', '--dry-run', 'origin'], cwd=/tmp/downstream, universal_newlines=False, shell=None, istream=None)
DEBUG:git.cmd:Popen(['git', 'cat-file', '--batch-check'], cwd=/tmp/downstream, universal_newlines=False, shell=None, istream=<valid stream>)

It looks like #894 introduced the change here: https://github.com/gitpython-developers/GitPython/pull/894/files#diff-c276fc3c4df38382ec884e59657b869dR450-R458

added this to the v3.0.1 - Bugfixes milestone on Aug 14, 2019
Byron

Byron commented on Aug 14, 2019

@Byron
Member

Does @bdauvergne have an idea how the original issue could be fixed without repeated requests? Is caching possible?

For now I would revert the commit as the performance impact seems substantial, but hope it can be re-added with some sort of caching in place.

This seems to favour performance over fixing an issue with correctness, but I hope we will get a cached version of the reverted fix at some point.

ghost closed this as completedin d5cc590on Aug 14, 2019
added a commit that references this issue on Sep 3, 2019
added a commit that references this issue on Dec 17, 2019
added a commit that references this issue on Jun 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @Byron@jeblair

        Issue actions

          Possible performance regression in GitPython 3.0.0 · Issue #906 · gitpython-developers/GitPython