Skip to content

fix: local mode deletion of temp files on job end #3017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 22, 2023

Conversation

jmahlik
Copy link
Contributor

@jmahlik jmahlik commented Mar 22, 2022

Issue #, if available:
closes #2527

This is a clone of #2644 but has allow commits from maintainers on so should be easier to merge.

Description of changes:

Add try except around file cleanup in sagemaker.local.utils.copy_directory_structure. This was causing local mode to fail when docker (running as root) writes files to a mounted dir and the user does not have permission to remove the files.

The same pattern is followed here.

Testing done:

Add test for failed removal, behaves as expected.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2022

Codecov Report

Merging #3017 (aa5d3b2) into master (11ff7c0) will decrease coverage by 0.81%.
The diff coverage is 80.00%.

@@            Coverage Diff             @@
##           master    #3017      +/-   ##
==========================================
- Coverage   89.55%   88.75%   -0.81%     
==========================================
  Files         960      226     -734     
  Lines       88796    21967   -66829     
==========================================
- Hits        79523    19497   -60026     
+ Misses       9273     2470    -6803     
Impacted Files Coverage Δ
src/sagemaker/local/utils.py 93.33% <80.00%> (ø)
...thon3.8/site-packages/sagemaker/hyperparameters.py
...n3.7/site-packages/sagemaker/lineage/_api_types.py
...n3.9/site-packages/sagemaker/sklearn/processing.py
...agemaker/cli/compatibility/v2/modifiers/airflow.py
...hon3.9/site-packages/sagemaker/xgboost/__init__.py
...3.9/site-packages/sagemaker/wrangler/processing.py
.../lib/python3.10/site-packages/sagemaker/clarify.py
...s/sagemaker/workflow/pipeline_experiment_config.py
...te-packages/sagemaker/workflow/step_collections.py
... and 1177 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 2ea19d8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 2ea19d8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 2ea19d8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@vikas0203 vikas0203 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 2ea19d8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 2ea19d8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@vikas0203 vikas0203 requested review from a team, navinsoni and vikas0203 and removed request for a team November 23, 2022 20:43
@navinsoni navinsoni changed the base branch from dev to master December 2, 2022 21:12
Copy link
Contributor

@navinsoni navinsoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmahlik Thanks for your contribution. Can you please update your PR and resolve conflicts.

@jmahlik
Copy link
Contributor Author

jmahlik commented Dec 6, 2022

@jmahlik Thanks for your contribution. Can you please update your PR and resolve conflicts.

My hard disk failed recently so I have to set up a new vm. It might take a bit. Open to you editing or cherry picking this in the meantime if you'd like? It should be open. Been trying to get this merged since September 2021.

@jmahlik jmahlik force-pushed the fix-local-mode-temp-files branch from 2ea19d8 to 66fdd51 Compare December 14, 2022 04:19
@jmahlik jmahlik requested a review from a team as a code owner December 14, 2022 04:19
@jmahlik
Copy link
Contributor Author

jmahlik commented Dec 14, 2022

@jmahlik Thanks for your contribution. Can you please update your PR and resolve conflicts.

@navinsoni rebased

@jmahlik jmahlik requested review from navinsoni and vikas0203 and removed request for vikas0203 and navinsoni December 14, 2022 04:20
Copy link
Contributor

@navinsoni navinsoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@navinsoni
Copy link
Contributor

@jmahlik I have approved the PR. I will merge it as soon as all tests pass.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 6a3cea9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@navinsoni navinsoni force-pushed the fix-local-mode-temp-files branch from fdeb5db to aa5d3b2 Compare January 4, 2023 01:51
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: aa5d3b2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: aa5d3b2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: aa5d3b2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: aa5d3b2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: aa5d3b2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jmahlik
Copy link
Contributor Author

jmahlik commented Jan 11, 2023

@navinsoni looks like some unrelated tests are falling on this one. Let me know if you'd like me to rebase or anything.

Copy link
Collaborator

@claytonparnell claytonparnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run slow-tests

@claytonparnell
Copy link
Collaborator

@jmahlik - looks like just some intermittent failures that we're working on resolving, should pass on retry. I'll get those passing so we can merge this in. Thanks again for contributing!

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: aa5d3b2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: aa5d3b2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@trajanikant trajanikant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run notebook-tests, slow-tests

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: aa5d3b2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: aa5d3b2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: aa5d3b2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: aa5d3b2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: aa5d3b2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jmahlik
Copy link
Contributor Author

jmahlik commented Feb 17, 2023

I can rebase if needed. Whenever the flakey tests start passing, it'd be nice to fix this. It is kind of annoying supporting 200+ sagemaker users who keep running in to this. I've been trying since 2021. Sorry to say but I'll close this and refer them to support if we can't figure this out. Kind of hard to contribute here, maybe we can do a tad better. Seems to have started when the jumpstart tests went in.

@claytonparnell claytonparnell merged commit dcca7b7 into aws:master Feb 22, 2023
@claytonparnell
Copy link
Collaborator

Apologies @jmahlik it has been merged. We've been trying to prioritize fixing the PR tests, will be resolved soon.

JoseJuan98 pushed a commit to JoseJuan98/sagemaker-python-sdk that referenced this pull request Mar 4, 2023
nmadan pushed a commit to nmadan/sagemaker-python-sdk that referenced this pull request Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

local mode permission denied when file is written from running job to mounted volume
8 participants