-
Notifications
You must be signed in to change notification settings - Fork 67
Empty read from gitdb.OStream.read() before EOF #120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If I print all chunk sizes with stream = db.stream(bytes.fromhex(sha))
sz = 0
while sz < stream.size:
print(len(chunk))
sz += len(chunk) there's a spread of sizes:
which seems to refute the idea expressed in this comment that it will recursively read() until the requested Lines 310 to 312 in f36c0cc
Removing
|
Thanks for reporting! I don't think, however, that the implementation can be trusted and it's better to use the Getting a chunk of size 0 in the middle is certainly unexpected, but maybe if that's fixed it will be suitable for consumption nonetheless? |
For the current application I just need to re-hash previously unseen trees/blobs using a different hashing scheme to git, and it has been working OK. Maybe I should re-run the git checksums as well as a sanity check; it would probably still be faster than a git pipe, and then I could debug any issues I detect. |
I see. In this case I'd recommend using |
Ah, maybe I should try pygit2. We also had a terrible time with libgit2 in a different application, we swore off it. But that may have been more about the bindings (node-git). I am happy using Rust in CLI tools but our internal auth stack is not available in Rust, and the two services where we use/could use |
I thought more in the direction of having a little CLI that performs a specific task, to shell out to from the main application. Alternatively, one could do the same but generate bindings. |
I have code that relies on reading an object from a gitdb stream.
To do this I used with a standard
.read()
loop (like with io.RawIOBase):The behaviour I expected to see (from the duck-type with RawIOBase) is to only see
b''
at EOF:However
stream.read(4096)
can return empty chunks even before the end of the stream, so the loop exits early.For the file where I saw this first, it is sensitive to the
size
parameter - it apparently occurs for0 < size <= 4096
.Looking at the code there is a condition to repeat a read if we got insufficient bytes:
gitdb/gitdb/stream.py
Lines 316 to 317 in f36c0cc
However the leading
if dcompdat and
means that the condition doesn't apply if zero bytes were read. Removing this part of the condition addresses the issue (but I understand from the comment that this is in order to supportcompressed_bytes_read()
).The text was updated successfully, but these errors were encountered: