Add ntm algorithm with doc, unit tests, integ tests #73

yangaws · 2018-02-07T18:48:23Z

Since ntm does similar job to lda, the implementation basically follows LDA. All codes include 4 parts:

1, NTM, NTMModel, NTMPredictor implementation
2, Unit tests
3, Integ tests
4, Doc

lukmis · 2018-02-07T21:13:13Z

src/sagemaker/amazon/ntm.py

+
+        return NTMModel(self.model_data, self.role, sagemaker_session=self.sagemaker_session)
+
+    def fit(self, records, mini_batch_size, **kwargs):


According to the doc (https://docs.aws.amazon.com/sagemaker/latest/dg/ntm_hyperparameters.html) mini_batch_size is not required. This function should not be necessary.

We could do validation for mini_batch_size if provided.

Thanks.

Now remove old validator and validate range of mini_batch_size instead.

lukmis · 2018-02-07T21:18:40Z

tests/integ/test_ntm.py

+            assert record.label["topic_mixture"] is not None
+
+
+def _prepare_record_set_from_local_files(dir_path, destination, num_records, feature_dim, sagemaker_session):


Could you move to a separate location as it is reused by both NTM and LDA?

Moved the method to new file that imported by both lda and ntm.

lukmis · 2018-02-07T21:19:36Z

tests/integ/test_ntm.py

+
+        assert len(result) == 1
+        for record in result:
+            assert record.label["topic_mixture"] is not None


According to: https://docs.aws.amazon.com/sagemaker/latest/dg/ntm-in-formats.html it is "topic_weights"

Oops! Changed to topic_weights.

lukmis · 2018-02-07T21:22:54Z

tests/integ/test_ntm.py

+
+        record_set = _prepare_record_set_from_local_files(data_path, ntm.data_location,
+                                                          len(all_records), feature_num, sagemaker_session)
+        ntm.fit(record_set, 100)


Probably we can skip 2nd parameter here.

Changed it to None. I think we still need to pass a None there even if we don't want to pass any values.

lukmis · 2018-02-07T21:42:33Z

tests/unit/test_ntm.py

+        NTM(epochs='other', sagemaker_session=sagemaker_session, **ALL_REQ_ARGS)
+
+
+def test_epochs_validation_fail_value(sagemaker_session):


Since the validation checks both min and max it would be great if we had both conditions checked for these HPs.

Updated.

Now all hyper-parameters with a range will be validated for both lower and upper limit.

lukmis · 2018-02-07T21:45:10Z

tests/unit/test_ntm.py

+MINI_BATCH_SIZE = 200
+HYPERPARAMS = {'num_topics': NUM_TOPICS, 'feature_dim': FEATURE_DIM, 'mini_batch_size': MINI_BATCH_SIZE}
+STRINGIFIED_HYPERPARAMS = dict([(x, str(y)) for x, y in HYPERPARAMS.items()])
+HP_TRAIN_CALL = dict(BASE_TRAIN_CALL)


If this is not being used anywhere please remove.

Removed unnecessary parameters.

lukmis

A few comments.

lukmis · 2018-02-07T22:37:27Z

README.rst

@@ -39,7 +39,7 @@ You can install from source by cloning this repository and issuing a pip install

    git clone https://github.com/aws/sagemaker-python-sdk.git
    python setup.py sdist
-    pip install dist/sagemaker-1.0.3.tar.gz


If you want to bump the version here, please update setup.py and CHANGELOG

Fixed! Thanks

yangaws · 2018-02-07T23:27:32Z

tests/unit/test_ntm.py

@@ -286,11 +286,11 @@ def test_call_fit_wrong_type_mini_batch_size(sagemaker_session):
    data = RecordSet("s3://{}/{}".format(BUCKET_NAME, PREFIX), num_records=1, feature_dim=FEATURE_DIM,
                     channel='train')

-    with pytest.raises(ValueError):
+    with pytest.raises((TypeError, ValueError)):


Some comments here:

The reason I use tuple (TypeError, ValueError) is because, different error is raised in different python versions. Python 2 raises ValueError and Python 3 raises TypeError.

* Add data_type to hyperparameters (aws#54) When we describe a training job the data type of the hyper parameters is lost because we use a dict[str, str]. This adds a new field to Hyperparameter so that we can convert the datatypes at runtime. instead of validating with isinstance(), we cast the hp value to the type it is meant to be. This enforces a "strongly typed" value. When we deserialize from the API string responses it becomes easier to deal with too. * Add wrapper for LDA. (aws#56) Update CHANGELOG and bump the version number. * Add support for async fit() (aws#59) when calling fit(wait=False) it will return immediately. The training job will carry on even if the process exits. by using attach() the estimator can be retrieved by providing the training job name. _prepare_init_params_from_job_description() is now a classmethod instead of being a static method. Each class is responsible to implement their specific logic to convert a training job description into arguments that can be passed to its own __init__() * Fix Estimator role expansion (aws#68) Instead of manually constructing the role ARN, use the IAM boto client to do it. This properly expands service-roles and regular roles. * Add FM and LDA to the documentation. (aws#66) * Fix description of an argument of sagemaker.session.train (aws#69) * Fix description of an argument of sagemaker.session.train 'input_config' should be an array which has channel objects. * Add a link to the botocore docs * Use 'list' instead of 'array' in the description * Add ntm algorithm with doc, unit tests, integ tests (aws#73) * JSON serializer: predictor.predict accepts dictionaries (aws#62) Add support for serializing python dictionaries to json Add prediction with dictionary in tf iris integ test * Fixing timeouts for PCA async integration test. (aws#78) Execute tf_cifar test without logs to eliminate delay to detect that job has finished. * Fixes in LinearLearner and unit tests addition. (aws#77) * Print out billable seconds after training completes (aws#30) * Added: print out billable seconds after training completes * Fixed: test_session.py to pass unit tests * Fixed: removed offending tzlocal() * Use sagemaker_timestamp when creating endpoint names in integration tests. (aws#81) * Support TensorFlow-1.5.0 and MXNet-1.0.0 (aws#82) * Update .gitignore to ignore pytest_cache. * Support TensorFlow-1.5.0 and MXNet-1.0.0 * Update and refactor tests. Add tests for fw_utils. * Fix typo. * Update changelog for 1.1.0 (aws#85)

…lled Scikit learn is already installed on mead

yangaws added 4 commits February 7, 2018 09:30

add ntm algorithm and unit tests

31051e3

Merge branch 'master' into ntm

f652a92

update doc with ntm algorithm

509b08d

add integ tests for ntm

77b285f

yangaws requested a review from lukmis February 7, 2018 18:48

lukmis reviewed Feb 7, 2018

View reviewed changes

lukmis suggested changes Feb 7, 2018

View reviewed changes

lukmis reviewed Feb 7, 2018

View reviewed changes

yangaws and others added 3 commits February 7, 2018 15:00

modify details according to feedbacks

d7bef8e

Merge branch 'master' into ntm

2a4c493

validate mini_batch_size

c74b6db

yangaws commented Feb 7, 2018

View reviewed changes

yangaws and others added 2 commits February 7, 2018 16:33

add default none to mini_batch_size

e871cec

Update CHANGELOG.

70ac565

lukmis approved these changes Feb 8, 2018

View reviewed changes

lukmis merged commit 795b030 into aws:master Feb 9, 2018

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this pull request Nov 15, 2018

Merge pull request aws#73 from awslabs/mvs-scikit-learn-already-insta…

3c17cba

…lled Scikit learn is already installed on mead

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ntm algorithm with doc, unit tests, integ tests #73

Add ntm algorithm with doc, unit tests, integ tests #73

yangaws commented Feb 7, 2018

lukmis Feb 7, 2018

lukmis Feb 7, 2018

yangaws Feb 7, 2018

lukmis Feb 7, 2018

yangaws Feb 7, 2018

lukmis Feb 7, 2018

yangaws Feb 7, 2018

lukmis Feb 7, 2018

yangaws Feb 7, 2018

lukmis Feb 7, 2018

yangaws Feb 7, 2018

lukmis Feb 7, 2018

yangaws Feb 7, 2018

lukmis left a comment

lukmis Feb 7, 2018

yangaws Feb 7, 2018

yangaws Feb 7, 2018


		return NTMModel(self.model_data, self.role, sagemaker_session=self.sagemaker_session)

		def fit(self, records, mini_batch_size, **kwargs):

		assert record.label["topic_mixture"] is not None


		def _prepare_record_set_from_local_files(dir_path, destination, num_records, feature_dim, sagemaker_session):

		NTM(epochs='other', sagemaker_session=sagemaker_session, **ALL_REQ_ARGS)


		def test_epochs_validation_fail_value(sagemaker_session):

Add ntm algorithm with doc, unit tests, integ tests #73

Add ntm algorithm with doc, unit tests, integ tests #73

Conversation

yangaws commented Feb 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukmis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment