Skip to content

Unclear why iter_items should be favored over list_items #1775

Closed
@EliahKagan

Description

@EliahKagan
Member

In git.util.IterableObj, the list_items method recommends that iter_items be used instead, and begins to explain why, but it appears the actual explanation was never written:

GitPython/git/util.py

Lines 1255 to 1256 in 4023f28

:note: Favor the iter_items method as it will

The deprecated git.util.Iterable class has the same note in its list_items docstring:

GitPython/git/util.py

Lines 1218 to 1219 in 4023f28

:note: Favor the iter_items method as it will

In both cases, the subsequent non-blank line in the docstring documents what the list_items method itself returns, so the continuation truly is missing. The intended continuation does not seem to me to be something that can be inferred from the docstring as a whole. For example, here's the whole IterableObj.list_items docstring:

GitPython/git/util.py

Lines 1250 to 1258 in 4023f28

"""
Find all items of this type - subclasses can specify args and kwargs differently.
If no args are given, subclasses are obliged to return all items if no additional
arguments arg given.
:note: Favor the iter_items method as it will
:return: list(Item,...) list of item instances
"""

It may seem odd that, unlike #1712, I did not notice this when working on #1725. But I think the reason is not that the docstrings had made sense to me at that time, but instead that I had noticed the more minor (really, almost trivial) issue that perhaps the first paragraph should be split so that only its first sentence would be its summary line, decided to return to it later to consider that further, and then forgot about it.

The text "Favor the iter_items method as it will" appears to have been present, and its continuation absent, for as long as the surrounding code has been in GitPython. It was introduced in f4fa1cb along with the Iterable class itself.

In general, it may sometimes be preferable to obtain an iterator rather than a list or other sequence because it may not be necessary to materialize a collection just to iterate over it, and because unnecessary materialization can sometimes increase space usage. On the other hand, materialization guards against mutation of the original collection during iteration. But these are completely general ideas, not informed by the list_items docstrings nor even by any consideration specific to GitPython.

My guess is that the docstring intended to say something more specific, or at least to identify which general benefit of iter_items serves to recommend it. So I don't think I could propose a specific improvement to that documentation without insight into what was originally intended.

Activity

Byron

Byron commented on Dec 21, 2023

@Byron
Member

Thanks for bringing this up!

Solely on the basis of 'knowing myself a little' I'd think that the continuation would be a general listing of advantages of iterators. If iter_items is indeed implemented as iterator, it should be worth following up on finishing the note as the latency of collecting all items could be high, depending on the item at hand. Nowadays I'd argue that list_items() shouldn't exist in the first place, but it's too late for that now.

EliahKagan

EliahKagan commented on Dec 22, 2023

@EliahKagan
MemberAuthor

If iter_items is indeed implemented as iterator

I would say it is implemented as an iterator. The more detailed picture is:

  • Hopefully code outside GitPython satisfies the documented contract, and GitPython can't be responsible for ensuring that.
  • If anything implements the deprecated git.util.Iterable anymore, it's code outside GitPython.
  • iter_items is abstract in git.util.IterableObj, which introduces it.
  • iter_items explicitly raises NotImplementedError in PushInfo and FetchInfo. Those concrete classes declare themselves to implement IterableObj yet, in effect, decline to implement this abstract method. However, because those classes don't implement list_items, whose base-class implementation uses iter_items, this is not a reason to avoid recommending iter_items over list_items in the base class docstrings.
  • Other classes that implement IterableObj have concrete implementations of iter_items that return iterators: either their own, or those introduced by an intermediate base class. I don't think I missed anything while looking into this.
  • The other iter_items method in the codebase belongs to SymbolicReference, and it returns an iterator. SymbolicReference does not inherit from IterableObj, nor from any other classes, except the implicit inheritance from object (not to be confused with GitPython's Object). I wonder if it would be beneficial to explain its relationship further in its class docstring; the description "Special case of a reference that is symbolic" might lead readers to think it is a subclass of some class representing a more general case, or that it is intended to be a subclass of Reference (which does implement IterableObj). But if I understand correctly, this is independent of any considerations in how IterableObj and its methods should be documented.

I've opened #1780 to propose some IterableObj and iter_items related improvements, including filling in the missing information about why iter_items should be preferred to list_items along the lines of what you've said here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @Byron@EliahKagan

      Issue actions

        Unclear why iter_items should be favored over list_items · Issue #1775 · gitpython-developers/GitPython