Skip to content

path is removed from entrypoint when using airflow operator #770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
filthysocks opened this issue Apr 29, 2019 · 4 comments
Closed

path is removed from entrypoint when using airflow operator #770

filthysocks opened this issue Apr 29, 2019 · 4 comments

Comments

@filthysocks
Copy link

Please fill out the form below.

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans):TensorFlow, Airflow
  • Framework Version: Airflow 1.10.1, tensorflow 1.13.1
  • Python Version: 3.7.1
  • CPU or GPU: CPU
  • Python SDK Version:1.18.11
  • Are you using a custom image:No

Describe the problem

I want to use the Airflow operator 'SageMakerTrainingOperator'. When i define a path in the entypoint e.g. /my/path/to/entrypoint.py it will be converted to entrypoint.py.
This happens in the sagemaker/workkflow/airflow.py file in the function 'prepare_framework' (line 37)
script = os.path.basename(estimator.entry_point)
According to the documentary it should be possible to define entrypoints with relative or absolute paths.

Minimal repro / logs

entry_path='/my/path/train.py'
tf_estimator = TensorFlow(
    entry_point=entry_path,
    role=role,
    train_instance_type='ml.m5.large',
    train_instance_count=1,
    framework_version='1.13.1',
    py_version='py3',
    model_dir='s3://some_dir'
)

inputs = {}

train_config = training_config(estimator=tf_estimator, inputs=inputs)
print(train_config['HyperParameters']['sagemaker_program']) #train.py
assert entry_path == train_config['HyperParameters']['sagemaker_program'] # won't work
@yangaws
Copy link
Contributor

yangaws commented Apr 30, 2019

Hi @filthysocks ,

Yep we should allow entry_point to be an absolute path here. That's a bug in the codes. I will try making a fix.

For now could you try to use the arguments entry_point with source_dir? Using your example, you can do:

entry_path='/my/path/train.py'
tf_estimator = TensorFlow(
    entry_point='train.py',
    source_dir='/my/path',
    role=role,
    train_instance_type='ml.m5.large',
    train_instance_count=1,
    framework_version='1.13.1',
    py_version='py3',
    model_dir='s3://some_dir'
)

@filthysocks
Copy link
Author

yes, that is what i'm doing right now, thanks for fixing it.
Note that it cannot handle paths relative to the source_dir either.

@ChoiByungWook
Copy link
Contributor

PR that will fix this issue: #796

@icywang86rui
Copy link
Contributor

PR is merged - #965
This should be fixed after Monday's release. Feel free to reopen if this is still an issue after Monday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants