-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add ntm algorithm with doc, unit tests, integ tests #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/sagemaker/amazon/ntm.py
Outdated
|
||
return NTMModel(self.model_data, self.role, sagemaker_session=self.sagemaker_session) | ||
|
||
def fit(self, records, mini_batch_size, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the doc (https://docs.aws.amazon.com/sagemaker/latest/dg/ntm_hyperparameters.html) mini_batch_size is not required. This function should not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do validation for mini_batch_size if provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Now remove old validator and validate range of mini_batch_size instead.
tests/integ/test_ntm.py
Outdated
assert record.label["topic_mixture"] is not None | ||
|
||
|
||
def _prepare_record_set_from_local_files(dir_path, destination, num_records, feature_dim, sagemaker_session): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move to a separate location as it is reused by both NTM and LDA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the method to new file that imported by both lda and ntm.
tests/integ/test_ntm.py
Outdated
|
||
assert len(result) == 1 | ||
for record in result: | ||
assert record.label["topic_mixture"] is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to: https://docs.aws.amazon.com/sagemaker/latest/dg/ntm-in-formats.html it is "topic_weights"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Changed to topic_weights.
tests/integ/test_ntm.py
Outdated
|
||
record_set = _prepare_record_set_from_local_files(data_path, ntm.data_location, | ||
len(all_records), feature_num, sagemaker_session) | ||
ntm.fit(record_set, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we can skip 2nd parameter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to None. I think we still need to pass a None there even if we don't want to pass any values.
tests/unit/test_ntm.py
Outdated
NTM(epochs='other', sagemaker_session=sagemaker_session, **ALL_REQ_ARGS) | ||
|
||
|
||
def test_epochs_validation_fail_value(sagemaker_session): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the validation checks both min and max it would be great if we had both conditions checked for these HPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Now all hyper-parameters with a range will be validated for both lower and upper limit.
tests/unit/test_ntm.py
Outdated
MINI_BATCH_SIZE = 200 | ||
HYPERPARAMS = {'num_topics': NUM_TOPICS, 'feature_dim': FEATURE_DIM, 'mini_batch_size': MINI_BATCH_SIZE} | ||
STRINGIFIED_HYPERPARAMS = dict([(x, str(y)) for x, y in HYPERPARAMS.items()]) | ||
HP_TRAIN_CALL = dict(BASE_TRAIN_CALL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not being used anywhere please remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed unnecessary parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments.
@@ -39,7 +39,7 @@ You can install from source by cloning this repository and issuing a pip install | |||
|
|||
git clone https://github.com/aws/sagemaker-python-sdk.git | |||
python setup.py sdist | |||
pip install dist/sagemaker-1.0.3.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to bump the version here, please update setup.py and CHANGELOG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! Thanks
tests/unit/test_ntm.py
Outdated
@@ -286,11 +286,11 @@ def test_call_fit_wrong_type_mini_batch_size(sagemaker_session): | |||
data = RecordSet("s3://{}/{}".format(BUCKET_NAME, PREFIX), num_records=1, feature_dim=FEATURE_DIM, | |||
channel='train') | |||
|
|||
with pytest.raises(ValueError): | |||
with pytest.raises((TypeError, ValueError)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments here:
The reason I use tuple (TypeError, ValueError) is because, different error is raised in different python versions. Python 2 raises ValueError and Python 3 raises TypeError.
* Add data_type to hyperparameters (aws#54) When we describe a training job the data type of the hyper parameters is lost because we use a dict[str, str]. This adds a new field to Hyperparameter so that we can convert the datatypes at runtime. instead of validating with isinstance(), we cast the hp value to the type it is meant to be. This enforces a "strongly typed" value. When we deserialize from the API string responses it becomes easier to deal with too. * Add wrapper for LDA. (aws#56) Update CHANGELOG and bump the version number. * Add support for async fit() (aws#59) when calling fit(wait=False) it will return immediately. The training job will carry on even if the process exits. by using attach() the estimator can be retrieved by providing the training job name. _prepare_init_params_from_job_description() is now a classmethod instead of being a static method. Each class is responsible to implement their specific logic to convert a training job description into arguments that can be passed to its own __init__() * Fix Estimator role expansion (aws#68) Instead of manually constructing the role ARN, use the IAM boto client to do it. This properly expands service-roles and regular roles. * Add FM and LDA to the documentation. (aws#66) * Fix description of an argument of sagemaker.session.train (aws#69) * Fix description of an argument of sagemaker.session.train 'input_config' should be an array which has channel objects. * Add a link to the botocore docs * Use 'list' instead of 'array' in the description * Add ntm algorithm with doc, unit tests, integ tests (aws#73) * JSON serializer: predictor.predict accepts dictionaries (aws#62) Add support for serializing python dictionaries to json Add prediction with dictionary in tf iris integ test * Fixing timeouts for PCA async integration test. (aws#78) Execute tf_cifar test without logs to eliminate delay to detect that job has finished. * Fixes in LinearLearner and unit tests addition. (aws#77) * Print out billable seconds after training completes (aws#30) * Added: print out billable seconds after training completes * Fixed: test_session.py to pass unit tests * Fixed: removed offending tzlocal() * Use sagemaker_timestamp when creating endpoint names in integration tests. (aws#81) * Support TensorFlow-1.5.0 and MXNet-1.0.0 (aws#82) * Update .gitignore to ignore pytest_cache. * Support TensorFlow-1.5.0 and MXNet-1.0.0 * Update and refactor tests. Add tests for fw_utils. * Fix typo. * Update changelog for 1.1.0 (aws#85)
…lled Scikit learn is already installed on mead
Since ntm does similar job to lda, the implementation basically follows LDA. All codes include 4 parts:
1, NTM, NTMModel, NTMPredictor implementation
2, Unit tests
3, Integ tests
4, Doc