Skip to content

Dependencies are not completely specified #509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zjost opened this issue Nov 21, 2018 · 5 comments
Closed

Dependencies are not completely specified #509

zjost opened this issue Nov 21, 2018 · 5 comments

Comments

@zjost
Copy link

zjost commented Nov 21, 2018

Please fill out the form below.

System Information

Docker: FROM python:3

Describe the problem

Only pip install sagemaker, but complete dependencies are not in setup.py.

Minimal repro / logs

from sagemaker.tensorflow import TensorFlow
File "/usr/local/lib/python3.7/site-packages/sagemaker/tensorflow/__init__.py", line 23, in <module>
import tensorflow # noqa: E402, F401
ModuleNotFoundError: No module named 'tensorflow'

It seems I need to install TensorFlow locally to use this SDK, but it's not in the install "required" dependency list.

Ultimately, I'm trying to use this sdk to invoke a TensorFlow training job, but it's causing me a lot of pain due to the number of large dependencies. It's not trivial (and maybe not possible) to deploy NumPy, Pandas, and TensorFlow all to AWS Lambda just to call ".fit" on a local model.py.

I would just use Boto3, but I'm not sure how to send my local "model.py" file to the TensorFlow container.

@yangaws
Copy link
Contributor

yangaws commented Nov 27, 2018

Hi @zjost ,

Thanks for catching the unnecessary dependency. I sent this PR to remove the unnecessary import of tensorflow. #511

Could you provide more details for what you are trying to do with SageMaker Python SDK, what you have tried and what problems you got? We usually don't recommend to use boto3 (unless we have to) since it's more complicated than using Python SDK.

Thanks,
Yang

@zjost
Copy link
Author

zjost commented Nov 28, 2018

I am trying to automate the retraining of my SageMaker TensorFlow model. I wanted to kick off this training job with a Lambda function, but the SageMaker Python SDK has too many dependencies: NumPy, Pandas, and then the TensorFlow import that ultimately resulted in this issue.

I decided to try Boto3, but I didn't want to essentially re-create the complexity of uploading the entrypoint artifact and configure all the environment variables to enable the SageMaker TensorFlow container to work. As far as I know, there's not an easy way to use the regular SageMaker APIs to handle cases where you have a local entrypoint file.

Ultimately, I ended up building a Docker container that used TensorFlow's container as a base, which I then use in ECS/Fargate. This seems totally overkill just to kick off a SageMaker training job.

@yangaws
Copy link
Contributor

yangaws commented Dec 6, 2018

@zjost

How do you pack SageMaker Python SDK for usage of lambda function? Since you mentioned panda which is actually an extra_require in setup.py, I was wondering if the dependencies in install_requires are already big enough or actually ok to use.

The dependencies in extra_require are used for unit or integration testing which I assume you don't need in your lambda function.

@zjost
Copy link
Author

zjost commented Dec 11, 2018

@yangaws Let me re-test now that TensorFlow import is removed. I was following the AWS docs for preparing the deploy package by pip installing to a local directory. Let me verify whether or not Pandas is installed. I know some packages have issues and can't be easily installed on Lambda--do you have integration testing around Lambda deployments? If not, does it make sense to add it?

@jesterhazy
Copy link
Contributor

@zjost even without pandas, the dependency set for sagemaker still includes numpy and scipy, so it's going to be a challenge to fit in into a lambda package.

There is a new Lambda feature called "Layers" that can solve this problem for you. The idea would be to create a layer that includes the sagemaker sdk and all its dependencies, and then configure your function to reference it.

For more info, see the Lambda Layers docs. The blog post that announced Layers also includes a numpy/scipy example that should be easy to extend to include sagemaker.

Closing this issue. Please reopen if you still have trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants