Skip to content

ModelMonitor class doesn't cleanout monitor_schedule_name if create_monitor_schedule() fails. #1624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jmgray24 opened this issue Jun 24, 2020 · 5 comments
Assignees
Labels
component: model monitor Relates to SageMaker Model Monitor type: bug

Comments

@jmgray24
Copy link

Describe the bug
When creating a model Monitor and attaching a schedule using "create_monitoring_schedule", If the schedule fails to create due to Validation Exception, the schedule is never created, but the Model_monitor class retains the variables for schedule name etc.

This causes issues, because you can't delete the monitor using delete_monitoring_schedule(), but you cant create a new one as it is already initialized.

To reproduce
Create a Model Monitor

from sagemaker.model_monitor import DefaultModelMonitor
from sagemaker.model_monitor.dataset_format import DatasetFormat
from sagemaker import get_execution_role

role = get_execution_role()
my_monitor = DefaultModelMonitor(
    role=role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    volume_size_in_gb=20,
    max_runtime_in_seconds=3600,
)

my_monitor.suggest_baseline(
    baseline_dataset='s3://grayjh/player_data/player_data.csv',
    dataset_format=DatasetFormat.csv(header=True),
)

Create a bad schedule:

from sagemaker.model_monitor import CronExpressionGenerator

my_monitor.create_monitoring_schedule(
    monitor_schedule_name='my-monitoring-schedule',
    endpoint_input='mlops-bia-xgboost-2019-09-23-18-44-06-Prod',
    statistics=my_monitor.baseline_statistics(),
    constraints=my_monitor.suggested_constraints(),
    schedule_cron_expression="Bad Cron",
)

It should fail due to a bad CRON Expression

ClientError: An error occurred (ValidationException) when calling the CreateMonitoringSchedule operation: InvalidParameter: 1 validation error(s) found.
- format cron(0 \d+(/12)? *|? * *|? *), Bad Cron, CreateMonitoringScheduleInput.MonitoringScheduleConfig.ScheduleConfig.ScheduleExpression.

Try and recreate a valid monitor schedule

my_monitor.create_monitoring_schedule(
    monitor_schedule_name='my-monitoring-schedule1',
    endpoint_input='mlops-bia-xgboost-2019-09-23-18-44-06-Prod',
    statistics=my_monitor.baseline_statistics(),
    constraints=my_monitor.suggested_constraints(),
    schedule_cron_expression=CronExpressionGenerator.hourly(),
)

You get a fails to create error due to there already being a schedule

It seems that this object was already used to create an Amazon Model Monitoring Schedule. To create another, first delete the existing one using my_monitor.delete_monitoring_schedule().
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-63036cc6383a> in <module>()
      4     statistics=my_monitor.baseline_statistics(),
      5     constraints=my_monitor.suggested_constraints(),
----> 6     schedule_cron_expression=CronExpressionGenerator.hourly(),
      7 )

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/model_monitor/model_monitoring.py in create_monitoring_schedule(self, endpoint_input, record_preprocessor_script, post_analytics_processor_script, output_s3_uri, constraints, statistics, monitor_schedule_name, schedule_cron_expression, enable_cloudwatch_metrics)
   1213             )
   1214             print(message)
-> 1215             raise ValueError(message)
   1216 
   1217         self.monitoring_schedule_name = self._generate_monitoring_schedule_name(

ValueError: It seems that this object was already used to create an Amazon Model Monitoring Schedule. To create another, first delete the existing one using my_monitor.delete_monitoring_schedule().

Try and Delete that schedule
my_monitor.delete_monitoring_schedule()

This also fails:
ResourceNotFound: An error occurred (ResourceNotFound) when calling the DeleteMonitoringSchedule operation: Monitoring Schedule arn:aws:sagemaker:us-east-1:210829804582:monitoring-schedule/my-monitoring-schedule1 not found
The workaround is to manually force the schedule name to be None
my_monitor.monitoring_schedule_name = None

Expected behavior
I would expect that if the create_monitoring_schedule fails, the object variables should remain to None so that we can create without modifying the variables manually.

Screenshots or logs
Will provide example NoteBook with logs and repro steps.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 1.65.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): N/A
  • Framework version: N/A
  • Python version: python 3 (tested using default conda_python3 kernal on sagemaker notebook with updated sagemaker-python-sdk)
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context

@jmgray24
Copy link
Author

@jmgray24 jmgray24 changed the title ModelMonitor Sclass doesn't cleanout monitor_schedule_name if it fails to create. ModelMonitor class doesn't cleanout monitor_schedule_name if create_monitor_schedule() fails. Jun 24, 2020
@G-ecs
Copy link

G-ecs commented Jun 25, 2020

as a workaround you can set

my_monitor.monitoring_schedule_name = None

It seems that the class uses this attribute to check if an Amazon Model Monitoring Schedule has been created.

@jmgray24
Copy link
Author

Thanks @G-ecs.

@laurenyu Feel free to assign this to me. I can take a look at fixing it.

@G-ecs
Copy link

G-ecs commented Jun 30, 2020

@jmgray24 Didn't realise you were part of the dev team... I'll read more carefully next time ;-)

@ajaykarpur
Copy link
Contributor

Feel free to assign this to me. I can take a look at fixing it.

Sounds great, thank you! Please let us know if you run into any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: model monitor Relates to SageMaker Model Monitor type: bug
Projects
None yet
Development

No branches or pull requests

6 participants