-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add wrapper for FactorizationMachiones algorithm. #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Require default_mini_batch_size during object creation for algorithms that require mini_batch_size. Use default_mini_batch_size instead of hardcoded value.
Update documentation in the class. Update integration tests for modifications.
Bump up the version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FM implementation is good - a few concerns about the default / required argument changes.
@@ -0,0 +1,29 @@ | |||
========= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this! Should have done this from the get-go.
predictor_type = hp('predictor_type', isin('binary_classifier', 'regressor'), | ||
'Value "binary_classifier" or "regressor"') | ||
epochs = hp('epochs', (gt(0), isint), "An integer greater than 0") | ||
clip_gradient = hp('clip_gradient', isfloat, "A float value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This forces an explicit assignment, where an implicit assignment would be fine.
If I have a FacotrizationMachines object fm, then:
fm.clip_gradient = 55
will fail, because 55 is not float - even though it can be represented as a float. Consider using the isnumber
check instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea - will change
src/sagemaker/amazon/kmeans.py
Outdated
@@ -33,8 +33,8 @@ class KMeans(AmazonAlgorithmEstimatorBase): | |||
epochs = hp('epochs', (gt(0), isint), 'An integer greater-than 0') | |||
center_factor = hp('extra_center_factor', (gt(0), isint), 'An integer greater-than 0') | |||
|
|||
def __init__(self, role, train_instance_count, train_instance_type, k, init_method=None, | |||
max_iterations=None, tol=None, num_trials=None, local_init_method=None, | |||
def __init__(self, role, train_instance_count, train_instance_type, k, default_mini_batch_size=5000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introducing another argument with default value is a little dangerous. If someone is calling this function will arguments set positionally, then this will fail.
That said, I think we can take this risk - the chance of this problem occuring is low and we're doing a new release. I'd recommend including constructor signature changes in the changelog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a required parameter and must be specified by the user. An alternative to constructor declaration will be fit enforcement. Let me rewrite it in that alternative form.
@@ -60,7 +60,7 @@ class LinearLearner(AmazonAlgorithmEstimatorBase): | |||
unbias_label = hp('unbias_label', isbool, 'A boolean') | |||
num_point_for_scalar = hp('num_point_for_scalar', (isint, gt(0)), 'An integer greater-than 0') | |||
|
|||
def __init__(self, role, train_instance_count, train_instance_type, predictor_type='binary_classifier', | |||
def __init__(self, role, train_instance_count, train_instance_type, predictor_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move this to a required argument? This will break clients working with the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a required argument and we shouldn't default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is a minor release change we will not break existing calls and I'll leave it then.
src/sagemaker/amazon/pca.py
Outdated
@@ -31,7 +31,7 @@ class PCA(AmazonAlgorithmEstimatorBase): | |||
extra_components = hp(name='extra_components', validate=lambda x: x >= 0 and isinstance(x, int), | |||
validation_message="Value must be an integer greater than or equal to 0") | |||
|
|||
def __init__(self, role, train_instance_count, train_instance_type, num_components, | |||
def __init__(self, role, train_instance_count, train_instance_type, num_components, default_mini_batch_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to try to avoid including new required arguments to function signatures. Why can't this have a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a required parameter that has no default. Instead of constructor let's enforce by fit() in new algorithms and I will leave default logic here to avoid breaking existing calls.
src/sagemaker/amazon/validation.py
Outdated
@@ -46,3 +46,4 @@ def validate(value): | |||
isint = istype(int) | |||
isbool = istype(bool) | |||
isnumber = istype(numbers.Number) # noqa | |||
isfloat = istype(float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed above, probably don't need this.
XGBoost notebook changes
No description provided.