Skip to content

Commit 0369384

Browse files
committed
revised tutorial to match the changed usage, added basic information about object databases
1 parent 18be097 commit 0369384

File tree

3 files changed

+59
-44
lines changed

3 files changed

+59
-44
lines changed

doc/intro.rst

+5-7
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,18 @@
44
Overview / Install
55
==================
66

7-
GitPython is a python library used to interact with Git repositories.
7+
GitPython is a python library used to interact with git repositories, high-level like git-porcelain, or low-level like git-plumbing.
88

9-
GitPython was a port of the grit_ library in Ruby created by
10-
Tom Preston-Werner and Chris Wanstrath, but grew beyond its heritage through its improved design and performance.
9+
It provides abstractions of git objects for easy access of repository data, and additionally allows you to access the git repository more directly using either a pure python implementation, or the faster, but more resource intensive git command implementation.
1110

12-
.. _grit: http://grit.rubyforge.org
11+
The object database implementation is optimized for handling large quantities of objects and large datasets, which is achieved by using low-level structures and data streaming.
1312

1413
Requirements
1514
============
1615

17-
* Git_ tested with 1.5.3.7
18-
* Requires Git_ 1.7.0 or newer
16+
* Tested with `Git`_ 1.7.0 or newer
1917
* `Python Nose`_ - used for running the tests
20-
* `Mock by Michael Foord`_ used for tests. Requires 0.5
18+
* `Mock by Michael Foord`_ used for tests. Requires version 0.5
2119

2220
.. _Git: http://git-scm.com/
2321
.. _Python Nose: http://code.google.com/p/python-nose/

doc/reference.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -52,15 +52,15 @@ Objects.Submodule
5252
:members:
5353
:undoc-members:
5454

55-
Objects.Utils
55+
Objects.Util
5656
-------------
5757

58-
.. automodule:: git.objects.utils
58+
.. automodule:: git.objects.util
5959
:members:
6060
:undoc-members:
6161

6262
Index.Base
63-
-------------
63+
----------
6464

6565
.. automodule:: git.index.base
6666
:members:
@@ -138,9 +138,9 @@ Repo
138138
:members:
139139
:undoc-members:
140140

141-
Utils
142-
-----
141+
Util
142+
----
143143

144-
.. automodule:: git.utils
144+
.. automodule:: git.util
145145
:members:
146146
:undoc-members:

doc/tutorial.rst

+48-31
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
GitPython Tutorial
99
==================
1010

11-
GitPython provides object model access to your git repository. This tutorial is composed of multiple sections, each of which explain a real-life usecase.
11+
GitPython provides object model access to your git repository. This tutorial is composed of multiple sections, each of which explains a real-life usecase.
1212

1313
Initialize a Repo object
1414
************************
@@ -17,10 +17,12 @@ The first step is to create a ``Repo`` object to represent your repository::
1717

1818
from git import *
1919
repo = Repo("/Users/mtrier/Development/git-python")
20+
assert repo.bare == False
2021

21-
In the above example, the directory ``/Users/mtrier/Development/git-python`` is my working repository and contains the ``.git`` directory. You can also initialize GitPython with a bare repository::
22+
In the above example, the directory ``/Users/mtrier/Development/git-python`` is my working repository and contains the ``.git`` directory. You can also initialize GitPython with a *bare* repository::
2223

2324
repo = Repo.create("/var/git/git-python.git")
25+
assert repo.bare == True
2426
2527
A repo object provides high-level access to your data, it allows you to create and delete heads, tags and remotes and access the configuration of the repository::
2628
@@ -43,6 +45,25 @@ Archive the repository contents to a tar file::
4345

4446
repo.archive(open("repo.tar",'w'))
4547
48+
49+
Object Databases
50+
****************
51+
``Repo`` instances are powered by its object database instance which will be used when extracting any data, or when writing new objects.
52+
53+
The type of the database determines certain performance characteristics, such as the quantity of objects that can be read per second, the resource usage when reading large data files, as well as the average memory footprint of your application.
54+
55+
GitDB
56+
=====
57+
The GitDB is a pure-python implementation of the git object database. It is the default database to use in GitPython 0.3. Its uses less memory when handling huge files, but will be 2 to 5 times slower when extracting large quantities small of objects from densely packed repositories::
58+
59+
repo = Repo("path/to/repo", odbt=GitDB)
60+
61+
GitCmdObjectDB
62+
==============
63+
The git command database uses persistent git-cat-file instances to read repository information. These operate very fast under all conditions, but will consume additional memory for the process itself. When extracting large files, memory usage will be much higher than the one of the ``GitDB``::
64+
65+
repo = Repo("path/to/repo", odbt=GitCmdObjectDB)
66+
4667
Examining References
4768
********************
4869

@@ -88,46 +109,44 @@ Change the symbolic reference to switch branches cheaply ( without adjusting the
88109

89110
Understanding Objects
90111
*********************
91-
An Object is anything storable in git's object database. Objects contain information about their type, their uncompressed size as well as the actual data. Each object is uniquely identified by a SHA1 hash, being 40 hexadecimal characters in size or 20 bytes in size.
112+
An Object is anything storable in git's object database. Objects contain information about their type, their uncompressed size as well as the actual data. Each object is uniquely identified by a binary SHA1 hash, being 20 bytes in size.
92113

93114
Git only knows 4 distinct object types being Blobs, Trees, Commits and Tags.
94115

95-
In Git-Pyhton, all objects can be accessed through their common base, compared and hashed, as shown in the following example::
116+
In Git-Python, all objects can be accessed through their common base, compared and hashed. They are usually not instantiated directly, but through references or specialized repository functions::
96117

97118
hc = repo.head.commit
98119
hct = hc.tree
99120
hc != hct
100121
hc != repo.tags[0]
101122
hc == repo.head.reference.commit
102123
103-
Basic fields are::
124+
Common fields are::
104125

105126
hct.type
106127
'tree'
107128
hct.size
108129
166
109-
hct.sha
130+
hct.hexsha
110131
'a95eeb2a7082212c197cabbf2539185ec74ed0e8'
111-
hct.data # returns string with pure uncompressed data
112-
'...'
113-
len(hct.data) == hct.size
132+
hct.binsha
133+
'binary 20 byte sha1'
114134
115-
Index Objects are objects that can be put into git's index. These objects are trees and blobs which additionally know about their path in the filesystem as well as their mode::
135+
Index Objects are objects that can be put into git's index. These objects are trees, blobs and submodules which additionally know about their path in the filesystem as well as their mode::
116136
117137
hct.path # root tree has no path
118138
''
119139
hct.trees[0].path # the first subdirectory has one though
120140
'dir'
121-
htc.mode # trees have mode 0
122-
0
141+
htc.mode # trees have the mode of a linux directory
142+
040000
123143
'%o' % htc.blobs[0].mode # blobs have a specific mode though comparable to a standard linux fs
124144
100644
125145
126146
Access blob data (or any object data) directly or using streams::
127147
128-
htc.data # binary tree data as string ( inefficient )
129-
htc.blobs[0].data_stream # stream object to read data from
130-
htc.blobs[0].stream_data(my_stream) # write data to given stream
148+
htc.blobs[0].data_stream.read() # stream object to read data from
149+
htc.blobs[0].stream_data(open("blob_data", "w")) # write data to given stream
131150
132151
133152
The Commit object
@@ -153,11 +172,11 @@ The above will return commits 21-30 from the commit list.::
153172

154173
headcommit = repo.head.commit
155174

156-
headcommit.sha
175+
headcommit.hexsha
157176
'207c0c4418115df0d30820ab1a9acd2ea4bf4431'
158177

159178
headcommit.parents
160-
[<git.Commit "a91c45eee0b41bf3cdaad3418ca3850664c4a4b4">]
179+
(<git.Commit "a91c45eee0b41bf3cdaad3418ca3850664c4a4b4">,)
161180

162181
headcommit.tree
163182
<git.Tree "563413aedbeda425d8d9dcbb744247d0c3e8a0ac">
@@ -178,7 +197,7 @@ The above will return commits 21-30 from the commit list.::
178197
'cleaned up a lot of test information. Fixed escaping so it works with
179198
subprocess.'
180199

181-
Note: date time is represented in a ``seconds since epock`` format. Conversion to human readable form can be accomplished with the various time module methods::
200+
Note: date time is represented in a ``seconds since epoch`` format. Conversion to human readable form can be accomplished with the various `time module <http://docs.python.org/library/time.html>`_ methods::
182201

183202
import time
184203
time.asctime(time.gmtime(headcommit.committed_date))
@@ -187,8 +206,6 @@ Note: date time is represented in a ``seconds since epock`` format. Conversion t
187206
time.strftime("%a, %d %b %Y %H:%M", time.gmtime(headcommit.committed_date))
188207
'Wed, 7 May 2008 05:56'
189208

190-
.. _struct_time: http://docs.python.org/library/time.html
191-
192209
You can traverse a commit's ancestry by chaining calls to ``parents``::
193210

194211
headcommit.parents[0].parents[0].parents[0]
@@ -203,7 +220,7 @@ A tree records pointers to the contents of a directory. Let's say you want the r
203220
tree = repo.heads.master.commit.tree
204221
<git.Tree "a006b5b1a8115185a228b7514cdcd46fed90dc92">
205222

206-
tree.sha
223+
tree.hexsha
207224
'a006b5b1a8115185a228b7514cdcd46fed90dc92'
208225

209226
Once you have a tree, you can get the contents::
@@ -230,13 +247,13 @@ Its useful to know that a tree behaves like a list with the ability to query en
230247
'dir/file'
231248
blob.abspath
232249
'/Users/mtrier/Development/git-python/dir/file'
233-
>>>tree['dir/file'].sha == blob.sha
250+
>>>tree['dir/file'].binsha == blob.binsha
234251

235252
There is a convenience method that allows you to get a named sub-object from a tree with a syntax similar to how paths are written in an unix system::
236253

237254
tree/"lib"
238255
<git.Tree "c1c7214dde86f76bc3e18806ac1f47c38b2b7a30">
239-
tree/"dir/file" == blob.sha
256+
tree/"dir/file" == blob
240257

241258
You can also get a tree directly from the repository if you know its name::
242259

@@ -252,7 +269,7 @@ As trees only allow direct access to their direct entries, use the traverse met
252269

253270
tree.traverse()
254271
<generator object at 0x7f6598bd65a8>
255-
for entry in traverse(): do_something_with(entry)
272+
for entry in tree.traverse(): do_something_with(entry)
256273

257274
258275
The Index Object
@@ -263,15 +280,15 @@ The git index is the stage containing changes to be written with the next commit
263280
264281
Access objects and add/remove entries. Commit the changes::
265282

266-
for stage,blob in index.iter_blobs(): do_something(...)
267-
Access blob objects
268-
for (path,stage),entry in index.entries.iteritems: pass
269-
Access the entries directly
283+
for stage, blob in index.iter_blobs(): do_something(...)
284+
# Access blob objects
285+
for (path, stage), entry in index.entries.iteritems: pass
286+
# Access the entries directly
270287
index.add(['my_new_file']) # add a new file to the index
271288
index.remove(['dir/existing_file'])
272289
new_commit = index.commit("my commit message")
273290
274-
Create new indices from other trees or as result of a merge. Write that result to a new index::
291+
Create new indices from other trees or as result of a merge. Write that result to a new index file::
275292

276293
tmp_index = Index.from_tree(repo, 'HEAD~1') # load a tree into a temporary index
277294
merge_index = Index.from_tree(repo, 'base', 'HEAD', 'some_branch') # merge two trees three-way
@@ -303,7 +320,7 @@ Change configuration for a specific remote only::
303320
Obtaining Diff Information
304321
**************************
305322

306-
Diffs can generally be obtained by Subclasses of ``Diffable`` as they provide the ``diff`` method. This operation yields a DiffIndex allowing you to easily access diff information about paths.
323+
Diffs can generally be obtained by subclasses of ``Diffable`` as they provide the ``diff`` method. This operation yields a DiffIndex allowing you to easily access diff information about paths.
307324

308325
Diffs can be made between the Index and Trees, Index and the working tree, trees and trees as well as trees and the working copy. If commits are involved, their tree will be used implicitly::
309326

@@ -346,7 +363,7 @@ The return value will by default be a string of the standard output channel prod
346363
Keyword arguments translate to short and long keyword arguments on the commandline.
347364
The special notion ``git.command(flag=True)`` will create a flag without value like ``command --flag``.
348365

349-
If ``None`` is found in the arguments, it will be dropped silently. Lists and tuples passed as arguments will be unpacked to individual arguments. Objects are converted to strings using the str(...) function.
366+
If ``None`` is found in the arguments, it will be dropped silently. Lists and tuples passed as arguments will be unpacked recursively to individual arguments. Objects are converted to strings using the str(...) function.
350367

351368
And even more ...
352369
*****************

0 commit comments

Comments
 (0)