Description
In git, if I understand correctly, a commit-ish is a git object from which a commit can be reached by dereferencing it zero or more times, which is to say that all commits are commit-ish, some tag objects are commit-ish--those that, through (possibly repeated) dereferencing, eventually reach a commit--and no other types of git objects are ever commit-ish.
commit-ish (also committish)
A commit object or an object that can be recursively dereferenced to a commit object. The following are all commit-ishes: a commit object, a tag object that points to a commit object, a tag object that points to a tag object that points to a commit object, etc.
Therefore, all instances of GitPython's Commit
class, and some instances of GitPython's TagObject
class, encapsulate git objects that are actually commit-ish.
But GitPython has a Commit_ish
union type in the git.types
module, and that Commit_ish
type is considerably broader:
Line 53 in b2c3d8b
These four classes are the GitPython classes whose instances encapsulate any of the four types of git objects (of which blobs and trees are never actually commit-ish):
object type
One of the identifiers "commit", "tree", "tag" or "blob" describing the type of an object.
GitPython uses its Commit_ish
type in accordance with this much broader concept, at least some of the time and possibly always. For example, Commit_ish
is given as the return type of Object.new
:
Lines 77 to 78 in b2c3d8b
Commit_ish
cannot simply be replaced by Object
because GitPython's Object
class is also, through IndexObject
, a superclass of Submodule
(and the RootModule
subclass of Submodule
):
GitPython/git/objects/submodule/base.py
Line 82 in b2c3d8b
GitPython/git/objects/submodule/base.py
Lines 87 to 88 in b2c3d8b
GitPython/git/objects/submodule/base.py
Lines 100 to 101 in b2c3d8b
However, elsewhere in GitPython, Commit_ish
is used where it seems only a commit is intended to be allowed, though it is unclear if this is unintentional, intentional but only to allow type checkers to allow some code that can only reasonably be checked at runtime, or intentional for some other reason. For example, the Repo.commit
method, when called with one argument, looks up a commit in the repository it represents from a Commit_ish
or string, and returns the commit it finds as a Commit
:
Line 698 in b2c3d8b
This leads to a situation where one can write code that type checkers allow and that may appear intended to work, but that always fails, and in a way that may be unclear to users less familiar with git concepts:
>>> import git
>>> repo = git.Repo()
>>> tree = repo.tree()
>>> repo.commit(tree)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\ek\source\repos\GitPython\git\repo\base.py", line 709, in commit
return self.rev_parse(str(rev) + "^0")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ek\source\repos\GitPython\git\repo\fun.py", line 379, in rev_parse
obj = to_commit(obj)
^^^^^^^^^^^^^^
File "C:\Users\ek\source\repos\GitPython\git\repo\fun.py", line 221, in to_commit
raise ValueError("Cannot convert object %r to type commit" % obj)
ValueError: Cannot convert object <git.Tree "d5538cc6cc8839ccb0168baf9f98aebcedfd9c2c"> to type commit
An argument that this specific situation with Repo.commit
is not a typing bug is that this operation is fundamentally one that can only be checked at runtime in some cases. After all, an argument of type str
is also allowed and it cannot known until runtime what object a string happens to name. Even so, the method docstring should possibly be expanded to clarify this issue. Or perhaps if the situation with Commit_ish
is improved, then the potential for confusion will go away.
One way to improve this situation is to clearly document it in a docstring for the Commit_ish
type. But if possible it seems to me that more should be done:
- If known, the reason for the current situation should be stated there.
- Its relationship to other types should be clarified where otherwise confusing. For example,
Object
may benefit from greater clarity about what it ideally represents (git objects) versus the entirety of what it represents (that anObject
can also be aSubmodule
), and the way thatTree_ish
is narrower than all tree-ish git objects whileCommit_ish
is broader than all commit-ish git objects can be noted in one of their docstrings. - Maybe
Commit_ish
should be deprecated and one or more new types introduced, replacing all uses of it in GitPython.
If I am making a fundamental mistake about git concepts here, and GitPython's Commit_ish
has a closer and more intuitive relationship to commit-ish git objects than I think it does, then I apologize.
I have not figured out very much from GitPython's revision history what the reason for defining Commit_ish
as it is currently defined is, or alternatively why this union of all four actual git object types was introduced with the narrower-seeming name Commit_ish
. However, the Commit_ish
type was introduced in 82b131c (#1282), where the annotations it was used to replace had listed all four types Commit
, TagObject
, Tree
, and Blob
as explicit alternatives.
Activity
Byron commentedon Mar 5, 2024
Thanks for bringing this to my attention.
To me it seems that, no matter what, the
Commit_ish
type is too broad even though it is clearly defined. This seems like a bug that would better be fixed. A fix should only affect the type-checker as well, which I would think is not disruptive in most cases, particularly because failing to pass an actual commit-ish will always cause a runtime failureAlong with that, I agree that it would be good to further clarify that
git.Object
is technically more than four possible Git object types, simply because it's something that can probably not be fixed without being potentially breaking.Lastly,
Tree_ish
is described as narrower here, and I wonder if eventually this can be fixed beyond making this clear in the documentation initially. I realize though that this must very much depend on the site that accepts a tree-ish, as they would have to resolve it anyway.In summary, I think
Commit_ish
can be fixed, while the documentation ofgit.Object
andTree_ish
can be imrpoved.EliahKagan commentedon Mar 6, 2024
I had at first feared that this might change the runtime behavior of reasonable code, but it looks like that may not be the case. In particular, due to the way it is written,
Commit_ish
is resolved as aUnion
ofForwardRef
s. Unlike unions of "complete" types, which can be used as the second argument ofisinstance
orissubclass
, this union cannot be used that way.Even using values obtained from
inspect.get_annotations
orinspect.signature
does not appear to readily give anything that can be used in a runtime check.This is good news for removing the never-treeish alternatives because it suggests that only static typing could break--and depending on how people are using
Commit_ish
, that might really be revealing bugs rather than creating a false positive.I think I'm going to look at bit more into whether there are runtime implications of this change, even though a cursory examination suggests there are not. There is also the question of how GitPython uses it. There are many occurrences of it in GitPython's type annotations. Some appear to intend a union of all four actual git object types, while other appear to intend only those types that can actually be commit-ish. If it turns out that this impression is wrong and that GitPython uses it in a consistent way--closer examination will tell--then changing it may not be justified. But so long as that is not so--which it seems it may not be--then I think changing its definition may be justifiable.
A new union can be created for all four actual git object types. There is a question of how it should be named; I can let you know if I have trouble coming up with a good name, but if you have a particular name or names that should be preferred then you can let me know.
Assuming these changes can be made, I think there are two reasonable approaches. One is for me to expand and retitle #1859 to include these changes, assuming I am able to make them. The other approach is that I could weaken or remove some of the unjustified wording in the
Commit_ish
docstring there, and have the actual change toCommit_ish
and creation of a new union for all four actual git object types be a subsequent PR.Byron commentedon Mar 6, 2024
Thanks for looking into and validating it!
I thought that maybe
git.Objects
(with a plural 's') is a very sensible name, particularly when used in method signatures that expect any git object. As it might imply that multiple objects should be passed maybegit.ObjectKind
orgit.ObjectType
would be even better.Thanks again for your help with this, I am sure you will find a good path forward.
Temporarily rename Commit_ish to Old_commit_ish
EliahKagan commentedon Mar 14, 2024
I went with
AnyGitObject
, which seems to be a bit better thanGitObject
in that--though I struggle to articulate exactly why--it seems less likely to be confused with theGit
orObject
classes and more naturally to capture the concept. A possible disadvantage of "Any" inAnyGitObject
is that it could be confused with other "Any"s in Python types such astyping.Any
or theAnyStr
type variable, but I think this risk is minimal (those are two different prominent uses of "Any", so it is not as though there is a single fixed use in type names that this is going against).The reason I didn't go with
Objects
is that I agree that it has the problem of implying multiple objects. BothAnyGitObject
andObjects
have the problem that they are not natural to put for...
in "x
is an...
". I think this is very slightly less severe withAnyGitObject
thanObjects
, but still this may be a reason to preferGitObject
. Alternatives likeTrueGitObject
,ActualGitObject
, andRealGitObject
(or those without theGit
part) seem unnatural and also prone to their own confusions (e.g., "True" could be thought of as having to do with evaluation as a boolean, and all of those could be thought of as referring to being on disk or otherwise in an actual git object database).The reason I didn't go with
ObjectKind
orObjectType
is that the is a phrase does sound right... but is wrong. For example, "ABlob
is anObjectKind
" expresses thatBlob
is an instance ofObjectKind
(a falsehood), not thatBlob
is a subtype ofObjectKind
. In addition, the other meaning of object types in git is the string literals that identify them (e.g., `"blob"), so those would more be the object kinds.