Added: print out billable seconds after training completes #30

djarpin · 2017-12-24T03:23:23Z

The time it takes to run .fit() includes provisioning hardware, downloading the container, downloading data, running the algorithm, and saving outputs. Customer's are not charged for provisioning hardware, which can be a substantial portion of training. To provide better transparency here, it would be nice to print billable seconds after the "===== Job Complete =====" message.

I'll provide additional context offline.

laurenyu · 2018-02-02T19:52:41Z

two of the unit tests failed with this error:

        if wait:
            self._check_job_status(job_name, description)
            if dot:
                print()
            print('===== Job Complete =====')
            # Customers are not billed for hardware provisioning, so billable time is less than total time
>           billable_time = (description['TrainingEndTime'] - description['TrainingStartTime']) * instance_count
E           KeyError: 'TrainingEndTime'

* Add data_type to hyperparameters (aws#54) When we describe a training job the data type of the hyper parameters is lost because we use a dict[str, str]. This adds a new field to Hyperparameter so that we can convert the datatypes at runtime. instead of validating with isinstance(), we cast the hp value to the type it is meant to be. This enforces a "strongly typed" value. When we deserialize from the API string responses it becomes easier to deal with too. * Add wrapper for LDA. (aws#56) Update CHANGELOG and bump the version number. * Add support for async fit() (aws#59) when calling fit(wait=False) it will return immediately. The training job will carry on even if the process exits. by using attach() the estimator can be retrieved by providing the training job name. _prepare_init_params_from_job_description() is now a classmethod instead of being a static method. Each class is responsible to implement their specific logic to convert a training job description into arguments that can be passed to its own __init__() * Fix Estimator role expansion (aws#68) Instead of manually constructing the role ARN, use the IAM boto client to do it. This properly expands service-roles and regular roles. * Add FM and LDA to the documentation. (aws#66) * Fix description of an argument of sagemaker.session.train (aws#69) * Fix description of an argument of sagemaker.session.train 'input_config' should be an array which has channel objects. * Add a link to the botocore docs * Use 'list' instead of 'array' in the description * Add ntm algorithm with doc, unit tests, integ tests (aws#73) * JSON serializer: predictor.predict accepts dictionaries (aws#62) Add support for serializing python dictionaries to json Add prediction with dictionary in tf iris integ test * Fixing timeouts for PCA async integration test. (aws#78) Execute tf_cifar test without logs to eliminate delay to detect that job has finished. * Fixes in LinearLearner and unit tests addition. (aws#77) * Print out billable seconds after training completes (aws#30) * Added: print out billable seconds after training completes * Fixed: test_session.py to pass unit tests * Fixed: removed offending tzlocal() * Use sagemaker_timestamp when creating endpoint names in integration tests. (aws#81) * Support TensorFlow-1.5.0 and MXNet-1.0.0 (aws#82) * Update .gitignore to ignore pytest_cache. * Support TensorFlow-1.5.0 and MXNet-1.0.0 * Update and refactor tests. Add tests for fw_utils. * Fix typo. * Update changelog for 1.1.0 (aws#85)

Moved: Directory structure to be more hierarchical

Made test local processor to not depend on region setting

Added: print out billable seconds after training completes

7f86d3d

yijiezh requested review from owen-t and removed request for owen-t January 26, 2018 23:44

owen-t previously approved these changes Feb 2, 2018

View reviewed changes

djarpin and others added 2 commits February 17, 2018 11:14

Merge branch 'master' into arpin_billable_seconds

b144cc0

Fixed: test_session.py to pass unit tests

f8806b0

djarpin dismissed owen-t’s stale review via f8806b0 February 18, 2018 06:09

Fixed: removed offending tzlocal()

8df1fd3

laurenyu approved these changes Feb 19, 2018

View reviewed changes

Merge branch 'master' into arpin_billable_seconds

0d6781d

laurenyu merged commit 06249d4 into aws:master Feb 21, 2018

laurenyu added a commit to laurenyu/sagemaker-python-sdk that referenced this pull request May 31, 2018

Pass estimator module as hyperparameter for tuning jobs (aws#30)

f29cf47

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this pull request Nov 15, 2018

Merge pull request aws#30 from awslabs/arpin_directory_restructure

8715138

Moved: Directory structure to be more hierarchical

athewsey pushed a commit to athewsey/sagemaker-python-sdk that referenced this pull request May 28, 2021

Merge pull request aws#30 from verdimrc/fp-fix-test-local

3525a4b

Made test local processor to not depend on region setting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added: print out billable seconds after training completes #30

Added: print out billable seconds after training completes #30

djarpin commented Dec 24, 2017

laurenyu commented Feb 2, 2018

Added: print out billable seconds after training completes #30

Added: print out billable seconds after training completes #30

Conversation

djarpin commented Dec 24, 2017

laurenyu commented Feb 2, 2018