S3 Estimator and Image Classification #71

ragavvenkatesan · 2018-02-06T19:01:05Z

This PR sets up the SDK for S3 algorithms to come in and this brings in image classification.

sync

orchidmajumder · 2018-02-06T19:39:38Z

.gitignore

@@ -20,3 +20,5 @@ examples/tensorflow/distributed_mnist/data
 doc/_build
 **/.DS_Store
 venv/
+*.rec
+*~


What is this symbol?

It was added from a sync pull I made on the original repo. Not mine.

orchidmajumder · 2018-02-06T19:40:25Z

src/sagemaker/amazon/amazon_estimator.py

+    intended to be instantiated directly. This is difference from the base class
+    because this class handles S3 data"""
+
+    """Base class for Amazon first-party Estimator implementations. This class isn't intended


Duplicate doc

orchidmajumder · 2018-02-06T19:40:38Z

src/sagemaker/amazon/amazon_estimator.py

+        """Initialize an AmazonAlgorithmEstimatorBase.
+
+        Args:
+            algortihm (str): Use one of the supported algorithms


Where is the typo? I don't see.

iquintero · 2018-02-06T19:41:31Z

src/sagemaker/amazon/validation.py

@@ -45,4 +49,5 @@ def validate(value):

 isint = istype(int)
 isbool = istype(bool)
+isstr = istype(str)


These validations are gone. The pull request itself is stating there are conflicts, can you please update your branch to fix the conflicts?

Refer to this PR: 54b3830#diff-0bb270a0ed6827421dc5669020eb6427 to check what the changes to the hyperparameters are.

I need some of these validations. @orchidmajumder, what is a solution to this?

orchidmajumder

Mostly looked into syntactic issues. Need comments on semantics from Owen/Marcio.

ragavvenkatesan

@iquintero Conflicts removed.

ragavvenkatesan · 2018-02-06T19:46:55Z

src/sagemaker/amazon/amazon_estimator.py

+        """Initialize an AmazonAlgorithmEstimatorBase.
+
+        Args:
+            algortihm (str): Use one of the supported algorithms


Where is the typo? I don't see.

ragavvenkatesan · 2018-02-06T19:47:34Z

src/sagemaker/amazon/validation.py

@@ -45,4 +49,5 @@ def validate(value):

 isint = istype(int)
 isbool = istype(bool)
+isstr = istype(str)


I need some of these validations. @orchidmajumder, what is a solution to this?

iquintero

The unit tests failed :( It looks like there are some import errors and possibly flake8 errors too.

More importantly, please look into the changes to the hyperparameter class as the type validations are no longer required and instead you should declare the data type. I have left an example below.

iquintero · 2018-02-06T21:30:43Z

src/sagemaker/amazon/common.py

@@ -16,7 +16,7 @@

 import numpy as np
 from scipy.sparse import issparse
-
+import json


Please maintain the import order:

1.- python built in libraries
2.- 3rd party imports
3.- local library imports (from sagemaker...)

this still needs to be fixed. import json should go before the numpy import.

Also, please maintain the alphabetical order of the imports when you change it.

import io
import json
import struct
....

iquintero · 2018-02-06T21:33:41Z

src/sagemaker/amazon/amazon_estimator.py

+    intended to be instantiated directly. This is difference from the base class
+    because this class handles S3 data"""
+
+    mini_batch_size = hp('mini_batch_size', (validation.isint, validation.gt(0)))


please look at #54 we intentionally removed these type validations: isint() isbool() etc. In favor of declaring a specific type for the hp.

So this should be

hp('mini_batch_size', validation.gt(0), data_type=int)

This applies to every hp declaration in this PR.

…-python-sdk

iquintero

There are a lot of cases where the alignment is not consistent. I didn't want to list every single one but please go through it and review them.

Some examples are in the docstrings where each line is aligned differently. In some cases you have:

param_name (str)

some others are:

param_name ...(long space..) (str)

The original comments have not been addressed yet either. Everything is minor changes but they do add up. Once you change that this should be ready to merge.

iquintero · 2018-02-21T00:43:29Z

src/sagemaker/amazon/amazon_estimator.py

+        """Initialize an AmazonAlgorithmEstimatorBase.
+
+        Args:
+            algortihm (str): Use one of the supported algorithms


iquintero · 2018-02-21T00:46:20Z

src/sagemaker/amazon/common.py

@@ -16,7 +16,7 @@

 import numpy as np
 from scipy.sparse import issparse
-
+import json


this still needs to be fixed. import json should go before the numpy import.

Also, please maintain the alphabetical order of the imports when you change it.

import io
import json
import struct
....

iquintero · 2018-02-21T00:46:58Z

src/sagemaker/amazon/common.py

@@ -35,8 +34,18 @@ def __call__(self, array):
        return buf


-class record_deserializer(object):
+class file_to_image_serializer(object):


Keep one naming convention. FileToImageSerializer

I am using this because the other methods are also in this convention.. Refer numpy_to_recod_serializer. ..

iquintero · 2018-02-21T00:48:15Z

src/sagemaker/amazon/common.py


+class record_deserializer(object):


Same here.

RecordDeserializer

Again, I am maintaining this because of the other methods... refer
response_deseiralizer.

iquintero · 2018-02-21T00:49:03Z

src/sagemaker/amazon/common.py

@@ -47,6 +56,14 @@ def __call__(self, stream, content_type):
            stream.close()


+class response_deserializer(object):


ResponseDeserializer

iquintero · 2018-02-21T00:56:01Z

tests/integ/test_image_classification.py

@@ -0,0 +1,65 @@
+# Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.


2017-2018

might as well :P

iquintero · 2018-02-21T01:00:56Z

tests/unit/test_amazon_estimator.py

@@ -63,6 +64,13 @@ def test_init(sagemaker_session):
    assert pca.num_components == 55


+def test_s3_init(sagemaker_session):


I think all the tests you've added here should go on their own file. This way we have a better organized test suite. This existing file is meant to test the AmazonEstimator. you should create a file to test your new estimators. The content of the tests is fine just split it into its own file.

The test I wrote tests the AmazonS3Estimator, which is on the same module as AmazonEstimator, which is why I think that this belongs on this file. I have a serparate test for image classification tests.

iquintero · 2018-02-21T01:05:13Z

src/sagemaker/amazon/amazon_estimator.py

+            mini_batch_size (int or None): The size of each mini-batch to use when training. If None, a
+                    default value will be used.
+        """
+        default_mini_batch_size = 32


why dont you make 32 the default value for mini_batch_size in the method signature?

def fit(self, s3set, mini_batch_size=32, distribution='ShardedByS3Key', **kwargs):

then you don't even have to do this whole thing. and you can just set it as
self.mini_batch_size = mini_batch_size

Two reasosn why: 1. Its a protocol used in the other alogrithms. 2. We want to make this a must supply parameter for user. If I assume a default and it fails because of memory error, it becomes a customer error, which is wrong.

iquintero · 2018-02-21T01:15:37Z

src/sagemaker/amazon/image_classification.py

+    The implementation of :meth:`~sagemaker.predictor.RealTimePredictor.predict` in this
+    `RealTimePredictor` requires a `x-image` as input.
+
+    ``predict()`` returns """


I think this sentence is also incomplete. predict() returns <>... ?

iquintero · 2018-02-21T01:25:42Z

src/sagemaker/amazon/image_classification.py

+    checkpoint_frequency = hp('checkpoint_frequency', (ge(1),),
+                              'checkpoint_frequency should be an integer greater-than 1', int)
+    num_layers = hp('num_layers', (isin(18, 34, 50, 101, 152, 200, 20, 32, 44, 56, 110),),
+                    'num_layers should be in the set [18, 34, 50, 101, 152, 200, 20, 32, 44, 56, 110]', int)


just a suggestion but maybe putting these in ascending order would make it easier for users to reason about? Or is there a reason why they are seemingly in a random order?

Again, I don't want to because this is a logic that is used in the docs also. There is a reason why it is in this order and users familiar with this algorithm will find this ordering comfortable.

…-python-sdk

laurenyu · 2018-08-27T18:41:11Z

closing due to inactivity. feel free to reopen (or maybe create a new PR, given all the merge conflicts) if work on this resumes.

[LDA] Fix minor typos

Ragav Venkatesan and others added 10 commits January 16, 2018 19:18

Merge pull request #1 from aws/master

88bd056

sync

image classification algorithm api

ac7b854

image classification api

8b96f69

sync

aea77a1

sync

7167aec

estimator is done. waiting on tests.

3d985e7

formatting for flake

24353e2

merge

3d91eb7

conflicts

2de775a

ic-sdk push

a919bce

ragavvenkatesan added the type: feature request label Feb 6, 2018

ragavvenkatesan assigned vrkhare Feb 6, 2018

ragavvenkatesan requested review from mvsusp, orchidmajumder and owen-t February 6, 2018 19:01

orchidmajumder reviewed Feb 6, 2018

View reviewed changes

iquintero suggested changes Feb 6, 2018

View reviewed changes

orchidmajumder reviewed Feb 6, 2018

View reviewed changes

Ragav Venkatesan added 2 commits February 6, 2018 11:46

removed duplicate doc

5b9eec0

Merge branch 'master' into master

7f1389a

ragavvenkatesan commented Feb 6, 2018

View reviewed changes

iquintero suggested changes Feb 6, 2018

View reviewed changes

Ragav Venkatesan and others added 6 commits February 15, 2018 11:45

moving forward with the recent updates

ddd0e68

Merge branch 'master' of https://github.com/ragavvenkatesan/sagemaker…

39c6ba4

…-python-sdk

updating for sync

c61c7ef

Update __init__.py

2825073

style changes to code

85564ef

Merge branch 'master' of https://github.com/ragavvenkatesan/sagemaker…

b43e652

…-python-sdk

Ragav Venkatesan and others added 8 commits February 15, 2018 12:26

merge conflicts

c5ead9a

Merge branch 'master' into master

13cf73b

unit tests fixed

8e305fa

sync conflicts

db548c2

flake errors fixed

9c9469f

integ tests environment fix

8a4f3ea

flake fails fixed

8557394

Merge branch 'master' into master

80e0283

iquintero suggested changes Feb 21, 2018

View reviewed changes

Ragav Venkatesan and others added 5 commits March 1, 2018 11:33

Merge branch 'master' into master

066e8b7

answered all the review suggestions

5754cba

Merge branch 'master' of https://github.com/ragavvenkatesan/sagemaker…

7c80d16

…-python-sdk

Merge branch 'master' into master

e068276

Merge branch 'master' into master

baf9f6e

laurenyu closed this Aug 27, 2018

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this pull request Nov 15, 2018

Merge pull request aws#71 from awslabs/lda_topic_modeling

5a8d119

[LDA] Fix minor typos

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 Estimator and Image Classification #71

S3 Estimator and Image Classification #71

ragavvenkatesan commented Feb 6, 2018

orchidmajumder Feb 6, 2018

ragavvenkatesan Feb 6, 2018

orchidmajumder Feb 6, 2018

orchidmajumder Feb 6, 2018

ragavvenkatesan Feb 6, 2018

iquintero Feb 21, 2018

iquintero Feb 6, 2018

ragavvenkatesan Feb 6, 2018

orchidmajumder left a comment

ragavvenkatesan left a comment

ragavvenkatesan Feb 6, 2018

ragavvenkatesan Feb 6, 2018

iquintero left a comment

iquintero Feb 6, 2018

iquintero Feb 21, 2018

iquintero Feb 6, 2018

iquintero left a comment •

edited

Loading

iquintero Feb 21, 2018

iquintero Feb 21, 2018

iquintero Feb 21, 2018

ragavvenkatesan Mar 1, 2018

iquintero Feb 21, 2018

ragavvenkatesan Mar 1, 2018

iquintero Feb 21, 2018

iquintero Feb 21, 2018

iquintero Feb 21, 2018

ragavvenkatesan Mar 1, 2018

iquintero Feb 21, 2018

ragavvenkatesan Mar 1, 2018

iquintero Feb 21, 2018

iquintero Feb 21, 2018

ragavvenkatesan Mar 1, 2018

laurenyu commented Aug 27, 2018

		@@ -47,6 +56,14 @@ def __call__(self, stream, content_type):
		stream.close()


		class response_deserializer(object):

		@@ -0,0 +1,65 @@
		# Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.

		@@ -63,6 +64,13 @@ def test_init(sagemaker_session):
		assert pca.num_components == 55


		def test_s3_init(sagemaker_session):

S3 Estimator and Image Classification #71

S3 Estimator and Image Classification #71

Conversation

ragavvenkatesan commented Feb 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orchidmajumder left a comment

Choose a reason for hiding this comment

ragavvenkatesan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iquintero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iquintero left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

laurenyu commented Aug 27, 2018

iquintero left a comment •

edited

Loading