Skip to content

increase endpoint creation timeout to 20 minutes #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 7, 2017

Conversation

mvsusp
Copy link
Contributor

@mvsusp mvsusp commented Dec 7, 2017

No description provided.

@iquintero
Copy link
Contributor

On the message:
increased -> increase

so that it is in imperative mood 👍

@laurenyu
Copy link
Contributor

laurenyu commented Dec 7, 2017

I thought previously we had decided that a timeout of 15 minutes should be plenty. Did something change?

@jesterhazy
Copy link
Contributor

@laurenyu it's a balance. we want to set the timeout so that tests that are going to fail anyway don't jam up the build queue, but we don't want to fail tests that will succeed because our timeouts are too low. i think @mvsusp has found that the current timeouts are causing too many artificial failures.

@laurenyu
Copy link
Contributor

laurenyu commented Dec 7, 2017

Of course, I understand tradeoff. I just want to make sure we're not unnecessarily increasing the timeout - e.g. do all of the tests need to have increased timeouts? was 20 minutes based on running the tests without a timeout or just a small increase while we continue experimenting? I didn't see any explanation in this PR, so I wanted to ask if there was a previous conversation I had missed.

@mvsusp mvsusp changed the title increased endpoint creation timeout to 20 minutes increase endpoint creation timeout to 20 minutes Dec 7, 2017
@mvsusp
Copy link
Contributor Author

mvsusp commented Dec 7, 2017

@laurenyu I increased only the timeouts for endpoints creation. My (small) sample of experiments demonstrated that one of the possible issues with 15 minutes timeouts is when the endpoint delete operation happens while the endpoint creation operation is still happening. That results in a flaky error and an endpoint that was not deleted.

Now that we have statistics in the integration tests, we can check which tests are slower and refactor them as necessary.

@laurenyu
Copy link
Contributor

laurenyu commented Dec 7, 2017

cool, thanks for the explanation!

@mvsusp mvsusp merged commit 191be47 into master Dec 7, 2017
@mvsusp mvsusp deleted the mvs-increased-endpoint-timeout branch December 7, 2017 17:55
laurenyu added a commit to laurenyu/sagemaker-python-sdk that referenced this pull request May 31, 2018
aws#11 updated master to reflect the public SDK. This change brings this branch up to date.
apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this pull request Nov 15, 2018
@prithuraj prithuraj mentioned this pull request Feb 23, 2019
athewsey pushed a commit to athewsey/sagemaker-python-sdk that referenced this pull request May 21, 2021
FrameworkProcessor.get_run_args()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants