Skip to content

Commit 41880fa

Browse files
GaryTu1020pengk19
authored andcommitted
feature: deal with credentials for Git support for GitHub (aws#914)
add authentication info
1 parent 2d4b7d4 commit 41880fa

File tree

8 files changed

+830
-132
lines changed

8 files changed

+830
-132
lines changed

doc/overview.rst

+61-24
Original file line numberDiff line numberDiff line change
@@ -183,45 +183,57 @@ Here is an example:
183183
# When you are done using your endpoint
184184
algo.delete_endpoint()
185185
186-
Git Support
187-
-----------
188-
If you have your training scripts in your GitHub repository, you can use them directly without the trouble to download
189-
them to local machine. Git support can be enabled simply by providing ``git_config`` parameter when initializing an
190-
estimator. If Git support is enabled, then ``entry_point``, ``source_dir`` and ``dependencies`` should all be relative
191-
paths in the Git repo. Note that if you decided to use Git support, then everything you need for ``entry_point``,
192-
``source_dir`` and ``dependencies`` should be in a single Git repo.
186+
Use Scripts Stored in a Git Repository
187+
--------------------------------------
188+
When you create an estimator, you can specify a training script that is stored in a GitHub or other Git repository as the entry point for the estimator, so that you don't have to download the scripts locally.
189+
If you do so, source directory and dependencies should be in the same repo if they are needed. Git support can be enabled simply by providing ``git_config`` parameter
190+
when creating an ``Estimator`` object. If Git support is enabled, then ``entry_point``, ``source_dir`` and ``dependencies``
191+
should be relative paths in the Git repo if provided.
193192

194-
Here are ways to specify ``git_config``:
193+
The ``git_config`` parameter includes fields ``repo``, ``branch``, ``commit``, ``2FA_enabled``, ``username``,
194+
``password`` and ``token``. The ``repo`` field is required. All other fields are optional. ``repo`` specifies the Git
195+
repository where your training script is stored. If you don't provide ``branch``, the default value 'master' is used.
196+
If you don't provide ``commit``, the latest commit in the specified branch is used.
195197

196-
.. code:: python
198+
``2FA_enabled``, ``username``, ``password`` and ``token`` are used for authentication. Set ``2FA_enabled`` to 'True' if
199+
two-factor authentication is enabled for the GitHub (or other Git) account, otherwise set it to 'False'.
200+
If you do not provide a value for ``2FA_enabled``, a default value of 'False' is used.
197201

198-
# Specifies the git_config parameter
199-
git_config = {'repo': 'https://github.com/username/repo-with-training-scripts.git',
200-
'branch': 'branch1',
201-
'commit': '4893e528afa4a790331e1b5286954f073b0f14a2'}
202-
203-
# Alternatively, you can also specify git_config by providing only 'repo' and 'branch'.
204-
# If this is the case, the latest commit in the branch will be used.
205-
git_config = {'repo': 'https://github.com/username/repo-with-training-scripts.git',
206-
'branch': 'branch1'}
202+
If ``repo`` is an SSH URL, you should either have no passphrase for the SSH key pairs, or have the ``ssh-agent`` configured
203+
so that you are not prompted for the SSH passphrase when you run a ``git clone`` command with SSH URLs. For SSH URLs, it
204+
does not matter whether two-factor authentication is enabled.
207205

208-
# Only providing 'repo' is also allowed. If this is the case, latest commit in
209-
# 'master' branch will be used.
210-
git_config = {'repo': 'https://github.com/username/repo-with-training-scripts.git'}
206+
If ``repo`` is an https URL, 2FA matters. When 2FA is disabled, either ``token`` or ``username``+``password`` will be
207+
used for authentication if provided (``token`` prioritized). When 2FA is enabled, only token will be used for
208+
authentication if provided. If required authentication info is not provided, python SDK will try to use local
209+
credentials storage to authenticate. If that fails either, an error message will be thrown.
211210

212-
The following are some examples to define estimators with Git support:
211+
Here are some examples of creating estimators with Git support:
213212

214213
.. code:: python
215214
215+
# Specifies the git_config parameter. This example does not provide Git credentials, so python SDK will try
216+
# to use local credential storage.
217+
git_config = {'repo': 'https://github.com/username/repo-with-training-scripts.git',
218+
'branch': 'branch1',
219+
'commit': '4893e528afa4a790331e1b5286954f073b0f14a2'}
220+
216221
# In this example, the source directory 'pytorch' contains the entry point 'mnist.py' and other source code.
217-
# and it is relative path inside the Git repo.
222+
# and it is relative path inside the Git repo.
218223
pytorch_estimator = PyTorch(entry_point='mnist.py',
219224
role='SageMakerRole',
220225
source_dir='pytorch',
221226
git_config=git_config,
222227
train_instance_count=1,
223228
train_instance_type='ml.c4.xlarge')
224229
230+
.. code:: python
231+
232+
# You can also specify git_config by providing only 'repo' and 'branch'.
233+
# If this is the case, the latest commit in that branch will be used.
234+
git_config = {'repo': '[email protected]:username/repo-with-training-scripts.git',
235+
'branch': 'branch1'}
236+
225237
# In this example, the entry point 'mnist.py' is all we need for source code.
226238
# We need to specify the path to it in the Git repo.
227239
mx_estimator = MXNet(entry_point='mxnet/mnist.py',
@@ -230,6 +242,15 @@ The following are some examples to define estimators with Git support:
230242
train_instance_count=1,
231243
train_instance_type='ml.c4.xlarge')
232244
245+
.. code:: python
246+
247+
# Only providing 'repo' is also allowed. If this is the case, latest commit in 'master' branch will be used.
248+
# This example does not provide '2FA_enabled', so 2FA is treated as disabled by default. 'username' and
249+
# 'password' are provided for authentication
250+
git_config = {'repo': 'https://github.com/username/repo-with-training-scripts.git',
251+
'username': 'username',
252+
'password': 'passw0rd!'}
253+
233254
# In this example, besides entry point and other source code in source directory, we still need some
234255
# dependencies for the training job. Dependencies should also be paths inside the Git repo.
235256
pytorch_estimator = PyTorch(entry_point='mnist.py',
@@ -240,7 +261,23 @@ The following are some examples to define estimators with Git support:
240261
train_instance_count=1,
241262
train_instance_type='ml.c4.xlarge')
242263
243-
When Git support is enabled, users can still use local mode in the same way.
264+
.. code:: python
265+
266+
# This example specifies that 2FA is enabled, and token is provided for authentication
267+
git_config = {'repo': 'https://github.com/username/repo-with-training-scripts.git',
268+
'2FA_enabled': True,
269+
'token': 'your-token'}
270+
271+
# In this exmaple, besides entry point, we also need some dependencies for the training job.
272+
pytorch_estimator = PyTorch(entry_point='pytorch/mnist.py',
273+
role='SageMakerRole',
274+
dependencies=['dep.py'],
275+
git_config=git_config,
276+
train_instance_count=1,
277+
train_instance_type='local')
278+
279+
Git support can be used not only for training jobs, but also for hosting models. The usage is the same as the above,
280+
and ``git_config`` should be provided when creating model objects, e.g. ``TensorFlowModel``, ``MXNetModel``, ``PyTorchModel``.
244281

245282
Training Metrics
246283
----------------

src/sagemaker/estimator.py

+16-5
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,10 @@
2020
from abc import abstractmethod
2121
from six import with_metaclass
2222
from six import string_types
23-
2423
import sagemaker
2524
from sagemaker import git_utils
2625
from sagemaker.analytics import TrainingJobAnalytics
26+
2727
from sagemaker.fw_utils import (
2828
create_image_uri,
2929
tar_and_upload_dir,
@@ -975,10 +975,12 @@ def __init__(
975975
>>> |----- test.py
976976
977977
You can assign entry_point='src/train.py'.
978-
git_config (dict[str, str]): Git configurations used for cloning files, including 'repo', 'branch'
979-
and 'commit' (default: None).
980-
'branch' and 'commit' are optional. If 'branch' is not specified, 'master' branch will be used. If
981-
'commit' is not specified, the latest commit in the required branch will be used.
978+
git_config (dict[str, str]): Git configurations used for cloning files, including ``repo``, ``branch``,
979+
``commit``, ``2FA_enabled``, ``username``, ``password`` and ``token`` (default: None). The fields are
980+
optional except ``repo``. If ``branch`` is not specified, master branch will be used. If ``commit``
981+
is not specified, the latest commit in the required branch will be used. 'branch' and 'commit' are
982+
optional. If 'branch' is not specified, 'master' branch will be used. If 'commit' is not specified,
983+
the latest commit in the required branch will be used.
982984
Example:
983985
984986
The following config:
@@ -989,6 +991,15 @@ def __init__(
989991
990992
results in cloning the repo specified in 'repo', then checkout the 'master' branch, and checkout
991993
the specified commit.
994+
``2FA_enabled``, ``username``, ``password`` and ``token`` are for authentication purpose.
995+
``2FA_enabled`` must be ``True`` or ``False`` if it is provided. If ``2FA_enabled`` is not provided,
996+
we consider 2FA as disabled. For GitHub and other Git repos, when ssh urls are provided, it does not
997+
make a difference whether 2FA is enabled or disabled; an ssh passphrase should be in local storage.
998+
When https urls are provided: if 2FA is disabled, then either token or username+password will
999+
be used for authentication if provided (token prioritized); if 2FA is enabled, only token will
1000+
be used for authentication if provided. If required authentication info is not provided, python SDK
1001+
will try to use local credentials storage to authenticate. If that fails either, an error message will
1002+
be thrown.
9921003
source_dir (str): Path (absolute or relative) to a directory with any other training
9931004
source code dependencies aside from the entry point file (default: None). Structure within this
9941005
directory are preserved when training on Amazon SageMaker. If 'git_config' is provided,

0 commit comments

Comments
 (0)