-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Internal server error when using OpenNMT-tf #1057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello @anasir-ureed Please share with us a minimal code example, allowing us to reproduce the issue. Thanks for using SageMaker |
Hello @mvsusp Thanks for your reply This is the script I am using in the entry point. "train.sh"
In the config file,
In the notebook please use
In training_inputs_location, the directory structure is as following:
Best wishes, |
Thanks @anasir-ureed. Where do I find the data files? |
Hello @mvsusp , Any data files I think can be used to reproduce the issue. It happened with me using several data sources. I have tested also using toy-ende data that can be downloaded if you follow the following link. http://opennmt.net/OpenNMT-tf/quickstart.html You will need to change the config file to match the data files names.
also please change train.sh script pip install OpenNMT-tf to :
A new major version just was released. Thanks, |
Hello @mvsusp , Any updates? Best wishes, |
Hi @AbdallahNasir, sorry for the delay. @mvsusp has left the team and we will continue looking into the issue. |
Best wishes to him. Thank you @ChuyangDeng, appreciate your efforts. |
Reference: 0411850995 @AbdallahNasir would it be possible for you to open an AWS support ticket? We would like to get some information to identify your training job that failed, so we can get the stack trace that is hidden behind the "internal server error". Feel free to reference this Github issue in your AWS support ticket. In the meanwhile, can you please provide the name for the training job that failed and region? Thanks! |
Hello @ChoiByungWook , The training job name is error-exp---2019-01-10--08-54-45-225648 in us-east-1. Our account name alias is "ureed". Thanks for your assistance. |
Hello @AbdallahNasir, I have passed this information to the respective team. Please make sure you file an AWS support ticket, as to ensure your issue doesn't get lost. Thanks! |
Dear @ChoiByungWook , It looks like I cannot create support tickets, as we do not have purchased support yet. Any chance we can do this from here? Best wishes, |
Hello @AbdallahNasir, I've reached out to the corresponding team and it looks like we are in need of your AWS account ID as well, however I would not post that on Github. Can you please message me through the AWS forums? https://forums.aws.amazon.com/forum.jspa?forumID=285 After logging in, please click "Private Messages" -> "Compose Message" -> and send to DanielCAtAWS. Feel free to also create a forum post in the AWS forums referencing this Github issue. Apologies for the back and forth. |
Hello @ChoiByungWook , Thanks for your reply. I have sent you the message on the form. Thanks for your support. Best wishes, |
Hello @AbdallahNasir, I have forwarded the ID to the corresponding team. Awaiting response from them. Thank you for your patience. |
Hi @ChoiByungWook, Thank you so much dear :) |
Hello @AbdallahNasir, The corresponding team has looked into the issue, and it seems to have been an issue on the platform side. A fix was recently deployed, can you please retry your training job? Thank you! |
Hello @ChoiByungWook, I have started another training job, called error-exp---2019-17-10--06-51-04-788983. It failed, producing the same error message. Thank you, |
Hello @ChoiByungWook, I hope you are having a great day. Any updates dear? Best wishes, |
Closing for triage. If you read this message and are also seeing this error message. Please open a new issue or file an AWS support ticket as suggested above. Thank you. |
…- Galactus (aws#1212) Co-authored-by: Gary Wang <[email protected]> Co-authored-by: Gary Wang <[email protected]> Co-authored-by: Raymond Liu <[email protected]> Co-authored-by: John Barboza <[email protected]> Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Mike Schneider <[email protected]> Co-authored-by: Bhupendra Singh <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Zuoyuan Huang <[email protected]> Co-authored-by: evakravi <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Alexander Pivovarov <[email protected]> Co-authored-by: SSRraymond <[email protected]> Co-authored-by: Ruilian Gao <[email protected]> Co-authored-by: Ao Guo <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: mariumof <[email protected]> Co-authored-by: matherit <[email protected]> Co-authored-by: amzn-choeric <[email protected]> Co-authored-by: Ao Guo <[email protected]> Co-authored-by: Sally Seok <[email protected]> Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Sally Seok <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Sarah Castillo <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: Xin Wang <[email protected]> Co-authored-by: stacicho <[email protected]> Co-authored-by: martinRenou <[email protected]> Co-authored-by: jiapinw <[email protected]> Co-authored-by: Akash Goel <[email protected]> Co-authored-by: Joseph Zhang <[email protected]> Co-authored-by: Harsha Reddy <[email protected]> Co-authored-by: Haixin Wang <[email protected]> Co-authored-by: Kalyani Nikure <[email protected]> Co-authored-by: Xin Wang <[email protected]> Co-authored-by: Gili Nachum <[email protected]> Co-authored-by: Jose Pena <[email protected]> Co-authored-by: cansun <[email protected]> Co-authored-by: AWS-pratab <[email protected]> Co-authored-by: shenlongtang <[email protected]> Co-authored-by: Zach Kimberg <[email protected]> Co-authored-by: chrivtho-github <[email protected]> Co-authored-by: Justin <[email protected]> Co-authored-by: Duc Trung Le <[email protected]> Co-authored-by: HappyAmazonian <[email protected]> Co-authored-by: cj-zhang <[email protected]> Co-authored-by: Matthew <[email protected]> Co-authored-by: Zach Kimberg <[email protected]> Co-authored-by: Rohith Nadimpally <[email protected]> Co-authored-by: rohithn1 <[email protected]> Co-authored-by: Victor Zhu <[email protected]> Co-authored-by: jbarz1 <[email protected]> Co-authored-by: Mohan Gandhi <[email protected]> Co-authored-by: Mohan Gandhi <[email protected]> Co-authored-by: Barboza <[email protected]> Co-authored-by: ruiliann666 <[email protected]> fixes (aws#963) fix: skip tensorflow local mode notebook test (aws#4060) Fix TorchTensorSer/Deser (aws#969) fix (aws#971) fix local container mode (aws#972) Fix auto detect (aws#979) Fix routing fn (aws#981) fix: tags for jumpstart model package models (aws#4061) fix: pipeline variable kms key (aws#4065) fix: jumpstart cache using sagemaker session s3 client (aws#4051) fix: gated models unsupported region (aws#4069) fix local container serialization (aws#989) fix custom serialiazation with local container. Also remove a lot of unused code (aws#994) Fix custom serialization for local container mode (aws#1000) fix pytorch version (aws#1001) Fix unit test (aws#990) Fix unit tests (aws#1018) Fix happy hf test (aws#1026) fix logic setup (aws#1034) fixes (aws#1045) Fix flake error in init (aws#1050) fix (aws#1053) fix: pipeline upsert failed to pass parallelism_config to update (aws#4066) fix: temporarily skip kmeans notebook (aws#4092) fixes (aws#1051) Fix missing absolute import error (aws#1057) Fix flake8 error in unit test (aws#1058) fixes (aws#1056) Fix flake8 error in integ test (aws#1060) Fix black format error in test_pickle_dependencies (aws#1062) Fix docstyle error under serve (aws#1065) Fix docstyle error in builder failure (aws#1066) fix black and flake8 formatting (aws#1069) Fix format error (aws#1070) Fix integ test (aws#1074) fix: HuggingFaceProcessor parameterized instance_type when image_uri is absent (aws#4072) fix: log message when sdk defaults not applied (aws#4104) fix: handle bad jumpstart default session (aws#4109) Fix the version information, whl and flake8 (aws#1085) Fix JSON serializer error (aws#1088) Fix unit test (aws#1091) fix format (aws#1103) Fix local mode predictor (aws#1107) Fix DJLPredictor (aws#1108) Fix modelbuilder unit tests (aws#1118) fixes (aws#1136) fixes (aws#1165) fixes (aws#1166) fix: auto ml integ tests and add flaky test markers (aws#4136) fix model data for JumpStartModel (aws#4135) fix: transform step unit test (aws#4151) fix: Update pipeline.py and selective_execution_config.py with small fixes (aws#1099) fix: Fixed bug in _create_training_details (aws#4141) fix: use correct line endings and s3 uris on windows (aws#4118) fix: js tagging s3 prefix (aws#4167) fix: Update Ec2 instance type to g5.4xlarge in test_huggingface_torch_distributed.py (aws#4181) fix: import error in unsupported js regions (aws#4188) fix: update local mode schema (aws#4185) fix: fix flaky Inference Recommender integration tests (aws#4156) fix: clone distribution in validate_distribution (aws#4205) Fix hyperlinks in feature_processor.scheduler parameter descriptions (aws#4208) Fix master merge formatting (aws#1186) Fix master unit tests (aws#1203) Fix djl unit tests (aws#1204) Fix merge conflicts (aws#1217) fix: fix URL links (aws#4217) fix: bump urllib3 version (aws#4223) fix: relax upper bound on urllib in local mode requirements (aws#4219) fixes (aws#1224) fix formatting (aws#1233) fix byoc unit tests (aws#1235) fix byoc unit tests (aws#1236)
Co-authored-by: Gary Wang <[email protected]> Co-authored-by: Gary Wang <[email protected]> Co-authored-by: Raymond Liu <[email protected]> Co-authored-by: John Barboza <[email protected]> Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Mike Schneider <[email protected]> Co-authored-by: Bhupendra Singh <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Zuoyuan Huang <[email protected]> Co-authored-by: evakravi <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Alexander Pivovarov <[email protected]> Co-authored-by: SSRraymond <[email protected]> Co-authored-by: Ruilian Gao <[email protected]> Co-authored-by: Ao Guo <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: mariumof <[email protected]> Co-authored-by: matherit <[email protected]> Co-authored-by: amzn-choeric <[email protected]> Co-authored-by: Ao Guo <[email protected]> Co-authored-by: Sally Seok <[email protected]> Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Sally Seok <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Sarah Castillo <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: Xin Wang <[email protected]> Co-authored-by: stacicho <[email protected]> Co-authored-by: martinRenou <[email protected]> Co-authored-by: jiapinw <[email protected]> Co-authored-by: Akash Goel <[email protected]> Co-authored-by: Joseph Zhang <[email protected]> Co-authored-by: Harsha Reddy <[email protected]> Co-authored-by: Haixin Wang <[email protected]> Co-authored-by: Kalyani Nikure <[email protected]> Co-authored-by: Xin Wang <[email protected]> Co-authored-by: Gili Nachum <[email protected]> Co-authored-by: Jose Pena <[email protected]> Co-authored-by: cansun <[email protected]> Co-authored-by: AWS-pratab <[email protected]> Co-authored-by: shenlongtang <[email protected]> Co-authored-by: Zach Kimberg <[email protected]> Co-authored-by: chrivtho-github <[email protected]> Co-authored-by: Justin <[email protected]> Co-authored-by: Duc Trung Le <[email protected]> Co-authored-by: HappyAmazonian <[email protected]> Co-authored-by: cj-zhang <[email protected]> Co-authored-by: Matthew <[email protected]> Co-authored-by: Zach Kimberg <[email protected]> Co-authored-by: Rohith Nadimpally <[email protected]> Co-authored-by: rohithn1 <[email protected]> Co-authored-by: Victor Zhu <[email protected]> Co-authored-by: jbarz1 <[email protected]> Co-authored-by: Mohan Gandhi <[email protected]> Co-authored-by: Mohan Gandhi <[email protected]> Co-authored-by: Barboza <[email protected]> Co-authored-by: ruiliann666 <[email protected]> fixes (#963) fix: skip tensorflow local mode notebook test (#4060) Fix TorchTensorSer/Deser (#969) fix (#971) fix local container mode (#972) Fix auto detect (#979) Fix routing fn (#981) fix: tags for jumpstart model package models (#4061) fix: pipeline variable kms key (#4065) fix: jumpstart cache using sagemaker session s3 client (#4051) fix: gated models unsupported region (#4069) fix local container serialization (#989) fix custom serialiazation with local container. Also remove a lot of unused code (#994) Fix custom serialization for local container mode (#1000) fix pytorch version (#1001) Fix unit test (#990) Fix unit tests (#1018) Fix happy hf test (#1026) fix logic setup (#1034) fixes (#1045) Fix flake error in init (#1050) fix (#1053) fix: pipeline upsert failed to pass parallelism_config to update (#4066) fix: temporarily skip kmeans notebook (#4092) fixes (#1051) Fix missing absolute import error (#1057) Fix flake8 error in unit test (#1058) fixes (#1056) Fix flake8 error in integ test (#1060) Fix black format error in test_pickle_dependencies (#1062) Fix docstyle error under serve (#1065) Fix docstyle error in builder failure (#1066) fix black and flake8 formatting (#1069) Fix format error (#1070) Fix integ test (#1074) fix: HuggingFaceProcessor parameterized instance_type when image_uri is absent (#4072) fix: log message when sdk defaults not applied (#4104) fix: handle bad jumpstart default session (#4109) Fix the version information, whl and flake8 (#1085) Fix JSON serializer error (#1088) Fix unit test (#1091) fix format (#1103) Fix local mode predictor (#1107) Fix DJLPredictor (#1108) Fix modelbuilder unit tests (#1118) fixes (#1136) fixes (#1165) fixes (#1166) fix: auto ml integ tests and add flaky test markers (#4136) fix model data for JumpStartModel (#4135) fix: transform step unit test (#4151) fix: Update pipeline.py and selective_execution_config.py with small fixes (#1099) fix: Fixed bug in _create_training_details (#4141) fix: use correct line endings and s3 uris on windows (#4118) fix: js tagging s3 prefix (#4167) fix: Update Ec2 instance type to g5.4xlarge in test_huggingface_torch_distributed.py (#4181) fix: import error in unsupported js regions (#4188) fix: update local mode schema (#4185) fix: fix flaky Inference Recommender integration tests (#4156) fix: clone distribution in validate_distribution (#4205) Fix hyperlinks in feature_processor.scheduler parameter descriptions (#4208) Fix master merge formatting (#1186) Fix master unit tests (#1203) Fix djl unit tests (#1204) Fix merge conflicts (#1217) fix: fix URL links (#4217) fix: bump urllib3 version (#4223) fix: relax upper bound on urllib in local mode requirements (#4219) fixes (#1224) fix formatting (#1233) fix byoc unit tests (#1235) fix byoc unit tests (#1236)
Co-authored-by: Raymond Liu <[email protected]> Co-authored-by: Ruilian Gao <[email protected]> Co-authored-by: John Barboza <[email protected]> Co-authored-by: Gary Wang <[email protected]> Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Zuoyuan Huang <[email protected]> Co-authored-by: Ao Guo <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Mike Schneider <[email protected]> Co-authored-by: Bhupendra Singh <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: evakravi <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Alexander Pivovarov <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: mariumof <[email protected]> Co-authored-by: matherit <[email protected]> Co-authored-by: amzn-choeric <[email protected]> Co-authored-by: Ao Guo <[email protected]> Co-authored-by: Sally Seok <[email protected]> Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Sally Seok <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Sarah Castillo <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: Xin Wang <[email protected]> Co-authored-by: stacicho <[email protected]> Co-authored-by: martinRenou <[email protected]> Co-authored-by: jiapinw <[email protected]> Co-authored-by: Akash Goel <[email protected]> Co-authored-by: Joseph Zhang <[email protected]> Co-authored-by: Harsha Reddy <[email protected]> Co-authored-by: Haixin Wang <[email protected]> Co-authored-by: Kalyani Nikure <[email protected]> Co-authored-by: Xin Wang <[email protected]> Co-authored-by: Gili Nachum <[email protected]> Co-authored-by: Jose Pena <[email protected]> Co-authored-by: cansun <[email protected]> Co-authored-by: AWS-pratab <[email protected]> Co-authored-by: shenlongtang <[email protected]> Co-authored-by: Zach Kimberg <[email protected]> Co-authored-by: chrivtho-github <[email protected]> Co-authored-by: Justin <[email protected]> Co-authored-by: Duc Trung Le <[email protected]> Co-authored-by: HappyAmazonian <[email protected]> Co-authored-by: cj-zhang <[email protected]> Co-authored-by: Matthew <[email protected]> Co-authored-by: Zach Kimberg <[email protected]> Co-authored-by: Rohith Nadimpally <[email protected]> Co-authored-by: rohithn1 <[email protected]> Co-authored-by: Victor Zhu <[email protected]> Co-authored-by: Gary Wang <[email protected]> Co-authored-by: SSRraymond <[email protected]> Co-authored-by: jbarz1 <[email protected]> Co-authored-by: Mohan Gandhi <[email protected]> Co-authored-by: Mohan Gandhi <[email protected]> Co-authored-by: Barboza <[email protected]> Co-authored-by: ruiliann666 <[email protected]> Co-authored-by: Rohan Gujarathi <[email protected]> Co-authored-by: svia3 <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Edward Sun <[email protected]> Co-authored-by: Stephen Via <[email protected]> Co-authored-by: Namrata Madan <[email protected]> Co-authored-by: Stacia Choe <[email protected]> Co-authored-by: Edward Sun <[email protected]> Co-authored-by: Edward Sun <[email protected]> Co-authored-by: Rohan Gujarathi <[email protected]> Co-authored-by: JohnaAtAWS <[email protected]> Co-authored-by: Vera Yu <[email protected]> Co-authored-by: bhaoz <[email protected]> Co-authored-by: Qing Lan <[email protected]> Co-authored-by: Namrata Madan <[email protected]> Co-authored-by: Sirut Buasai <[email protected]> Co-authored-by: wayneyao <[email protected]> Co-authored-by: Jacky Lee <[email protected]> Co-authored-by: haNa-meister <[email protected]> Co-authored-by: Shailav <[email protected]> Fix unit tests (#1018) Fix happy hf test (#1026) fix logic setup (#1034) fixes (#1045) Fix flake error in init (#1050) fix (#1053) fix: skip tensorflow local mode notebook test (#4060) fix: tags for jumpstart model package models (#4061) fix: pipeline variable kms key (#4065) fix: jumpstart cache using sagemaker session s3 client (#4051) fix: gated models unsupported region (#4069) fix: pipeline upsert failed to pass parallelism_config to update (#4066) fix: temporarily skip kmeans notebook (#4092) fixes (#1051) Fix missing absolute import error (#1057) Fix flake8 error in unit test (#1058) fixes (#1056) Fix flake8 error in integ test (#1060) Fix black format error in test_pickle_dependencies (#1062) Fix docstyle error under serve (#1065) Fix docstyle error in builder failure (#1066) fix black and flake8 formatting (#1069) Fix format error (#1070) Fix integ test (#1074) fix: HuggingFaceProcessor parameterized instance_type when image_uri is absent (#4072) fix: log message when sdk defaults not applied (#4104) fix: handle bad jumpstart default session (#4109) Fix the version information, whl and flake8 (#1085) Fix JSON serializer error (#1088) Fix unit test (#1091) fix format (#1103) Fix local mode predictor (#1107) Fix DJLPredictor (#1108) Fix modelbuilder unit tests (#1118) fixes (#1136) fixes (#1165) fixes (#1166) fix: auto ml integ tests and add flaky test markers (#4136) fix model data for JumpStartModel (#4135) fix: transform step unit test (#4151) fix: Update pipeline.py and selective_execution_config.py with small fixes (#1099) fix: Fixed bug in _create_training_details (#4141) fix: use correct line endings and s3 uris on windows (#4118) fix: js tagging s3 prefix (#4167) fix: Update Ec2 instance type to g5.4xlarge in test_huggingface_torch_distributed.py (#4181) fix: import error in unsupported js regions (#4188) fix: update local mode schema (#4185) fix: fix flaky Inference Recommender integration tests (#4156) fix: clone distribution in validate_distribution (#4205) Fix hyperlinks in feature_processor.scheduler parameter descriptions (#4208) Fix master merge formatting (#1186) Fix master unit tests (#1203) Fix djl unit tests (#1204) Fix merge conflicts (#1217) fix: fix URL links (#4217) fix: bump urllib3 version (#4223) fix: relax upper bound on urllib in local mode requirements (#4219) fixes (#1224) fix formatting (#1233) fix byoc unit tests (#1235) fix byoc unit tests (#1236) Fixed Modelpackage's deploy calling model's deploy (#1155) fix: jumpstart unit-test (#1265) fixes (#963) Fix TorchTensorSer/Deser (#969) fix (#971) fix local container mode (#972) Fix auto detect (#979) Fix routing fn (#981) fix local container serialization (#989) fix custom serialiazation with local container. Also remove a lot of unused code (#994) Fix custom serialization for local container mode (#1000) fix pytorch version (#1001) Fix unit test (#990) fix: Multiple bug fixes including removing unsupported feature. (#1105) Fix some problems with pipeline compilation (#1125) fix: Refactor JsonGet s3 URI and add serialize_output_to_json flag (#1164) fix: invoke_function circular import (#1262) fix: pylint (#1264) fix: Add logging for docker build failures (#1267) Fix session bug when provided in ModelBuilder (#1288) fixes (#1313) fix: Gated content bucket env var override (#1280) fix: Change the library used in pytorch test causing cloudpickle version conflict (#1287) fix: HMAC signing for ModelBuilder Triton python backend (#1282) fix: do not delete temp folder generated by sdist (#1291) fix: Do not require model_server if provided image_uri is a 1p image. (#1303) fix: check image type vs instance type (#1307) fix: unit test (#1315) fix: Fixed model builder's register unable to deploy (#1323) fix: missing `self._framework` in `InferenceSpec` path (#1325) fix: enable xgboost integ test in our own pipeline (#1326) fix: skip py310 (#1328) fix: Update autodetect dlc logic (#1329) Fix secret key in the Model object (#1334) fix: improve error message (#1333) Fix unit testing (#1340) fix: Typing and formatting (#1341) fix: WaiterError on failed pipeline execution. results() (#1337) Fix tox identified errors (#1344) Fix issue when the user runs in Python 3.11 (#1345) fixes (#1346) fix: use copy instead of move in bootstrap script (#1339) Resolve keynote3 conflicts (#1351) Resolve keynote3 conflicts v2 (#1353) Fix conflicts (#1354) Fix conflicts v3 (#1355) fix: get whl from local to run integ tests (#1357) fix: enable triton pt tests (#1358) fix: integ test (#1362) Fix Python 3.11 issue with dataclass decorator (#1345) fix: remote function include_local_workdir default value (#1342) fix: error message (#1373) fixes (#1372) fix: Remvoe PickleSerializer (#1378)
Co-authored-by: Gary Wang <[email protected]> Co-authored-by: Gary Wang <[email protected]> Co-authored-by: Raymond Liu <[email protected]> Co-authored-by: John Barboza <[email protected]> Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Mike Schneider <[email protected]> Co-authored-by: Bhupendra Singh <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Zuoyuan Huang <[email protected]> Co-authored-by: evakravi <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Alexander Pivovarov <[email protected]> Co-authored-by: SSRraymond <[email protected]> Co-authored-by: Ruilian Gao <[email protected]> Co-authored-by: Ao Guo <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: mariumof <[email protected]> Co-authored-by: matherit <[email protected]> Co-authored-by: amzn-choeric <[email protected]> Co-authored-by: Ao Guo <[email protected]> Co-authored-by: Sally Seok <[email protected]> Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Sally Seok <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Sarah Castillo <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: Xin Wang <[email protected]> Co-authored-by: stacicho <[email protected]> Co-authored-by: martinRenou <[email protected]> Co-authored-by: jiapinw <[email protected]> Co-authored-by: Akash Goel <[email protected]> Co-authored-by: Joseph Zhang <[email protected]> Co-authored-by: Harsha Reddy <[email protected]> Co-authored-by: Haixin Wang <[email protected]> Co-authored-by: Kalyani Nikure <[email protected]> Co-authored-by: Xin Wang <[email protected]> Co-authored-by: Gili Nachum <[email protected]> Co-authored-by: Jose Pena <[email protected]> Co-authored-by: cansun <[email protected]> Co-authored-by: AWS-pratab <[email protected]> Co-authored-by: shenlongtang <[email protected]> Co-authored-by: Zach Kimberg <[email protected]> Co-authored-by: chrivtho-github <[email protected]> Co-authored-by: Justin <[email protected]> Co-authored-by: Duc Trung Le <[email protected]> Co-authored-by: HappyAmazonian <[email protected]> Co-authored-by: cj-zhang <[email protected]> Co-authored-by: Matthew <[email protected]> Co-authored-by: Zach Kimberg <[email protected]> Co-authored-by: Rohith Nadimpally <[email protected]> Co-authored-by: rohithn1 <[email protected]> Co-authored-by: Victor Zhu <[email protected]> Co-authored-by: jbarz1 <[email protected]> Co-authored-by: Mohan Gandhi <[email protected]> Co-authored-by: Mohan Gandhi <[email protected]> Co-authored-by: Barboza <[email protected]> Co-authored-by: ruiliann666 <[email protected]> fixes (#963) fix: skip tensorflow local mode notebook test (#4060) Fix TorchTensorSer/Deser (#969) fix (#971) fix local container mode (#972) Fix auto detect (#979) Fix routing fn (#981) fix: tags for jumpstart model package models (#4061) fix: pipeline variable kms key (#4065) fix: jumpstart cache using sagemaker session s3 client (#4051) fix: gated models unsupported region (#4069) fix local container serialization (#989) fix custom serialiazation with local container. Also remove a lot of unused code (#994) Fix custom serialization for local container mode (#1000) fix pytorch version (#1001) Fix unit test (#990) Fix unit tests (#1018) Fix happy hf test (#1026) fix logic setup (#1034) fixes (#1045) Fix flake error in init (#1050) fix (#1053) fix: pipeline upsert failed to pass parallelism_config to update (#4066) fix: temporarily skip kmeans notebook (#4092) fixes (#1051) Fix missing absolute import error (#1057) Fix flake8 error in unit test (#1058) fixes (#1056) Fix flake8 error in integ test (#1060) Fix black format error in test_pickle_dependencies (#1062) Fix docstyle error under serve (#1065) Fix docstyle error in builder failure (#1066) fix black and flake8 formatting (#1069) Fix format error (#1070) Fix integ test (#1074) fix: HuggingFaceProcessor parameterized instance_type when image_uri is absent (#4072) fix: log message when sdk defaults not applied (#4104) fix: handle bad jumpstart default session (#4109) Fix the version information, whl and flake8 (#1085) Fix JSON serializer error (#1088) Fix unit test (#1091) fix format (#1103) Fix local mode predictor (#1107) Fix DJLPredictor (#1108) Fix modelbuilder unit tests (#1118) fixes (#1136) fixes (#1165) fixes (#1166) fix: auto ml integ tests and add flaky test markers (#4136) fix model data for JumpStartModel (#4135) fix: transform step unit test (#4151) fix: Update pipeline.py and selective_execution_config.py with small fixes (#1099) fix: Fixed bug in _create_training_details (#4141) fix: use correct line endings and s3 uris on windows (#4118) fix: js tagging s3 prefix (#4167) fix: Update Ec2 instance type to g5.4xlarge in test_huggingface_torch_distributed.py (#4181) fix: import error in unsupported js regions (#4188) fix: update local mode schema (#4185) fix: fix flaky Inference Recommender integration tests (#4156) fix: clone distribution in validate_distribution (#4205) Fix hyperlinks in feature_processor.scheduler parameter descriptions (#4208) Fix master merge formatting (#1186) Fix master unit tests (#1203) Fix djl unit tests (#1204) Fix merge conflicts (#1217) fix: fix URL links (#4217) fix: bump urllib3 version (#4223) fix: relax upper bound on urllib in local mode requirements (#4219) fixes (#1224) fix formatting (#1233) fix byoc unit tests (#1235) fix byoc unit tests (#1236)
Co-authored-by: Raymond Liu <[email protected]> Co-authored-by: Ruilian Gao <[email protected]> Co-authored-by: John Barboza <[email protected]> Co-authored-by: Gary Wang <[email protected]> Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Zuoyuan Huang <[email protected]> Co-authored-by: Ao Guo <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Mike Schneider <[email protected]> Co-authored-by: Bhupendra Singh <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Malav Shastri <[email protected]> Co-authored-by: evakravi <[email protected]> Co-authored-by: Keshav Chandak <[email protected]> Co-authored-by: Alexander Pivovarov <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: mariumof <[email protected]> Co-authored-by: matherit <[email protected]> Co-authored-by: amzn-choeric <[email protected]> Co-authored-by: Ao Guo <[email protected]> Co-authored-by: Sally Seok <[email protected]> Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Sally Seok <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Sarah Castillo <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: EC2 Default User <[email protected]> Co-authored-by: Xin Wang <[email protected]> Co-authored-by: stacicho <[email protected]> Co-authored-by: martinRenou <[email protected]> Co-authored-by: jiapinw <[email protected]> Co-authored-by: Akash Goel <[email protected]> Co-authored-by: Joseph Zhang <[email protected]> Co-authored-by: Harsha Reddy <[email protected]> Co-authored-by: Haixin Wang <[email protected]> Co-authored-by: Kalyani Nikure <[email protected]> Co-authored-by: Xin Wang <[email protected]> Co-authored-by: Gili Nachum <[email protected]> Co-authored-by: Jose Pena <[email protected]> Co-authored-by: cansun <[email protected]> Co-authored-by: AWS-pratab <[email protected]> Co-authored-by: shenlongtang <[email protected]> Co-authored-by: Zach Kimberg <[email protected]> Co-authored-by: chrivtho-github <[email protected]> Co-authored-by: Justin <[email protected]> Co-authored-by: Duc Trung Le <[email protected]> Co-authored-by: HappyAmazonian <[email protected]> Co-authored-by: cj-zhang <[email protected]> Co-authored-by: Matthew <[email protected]> Co-authored-by: Zach Kimberg <[email protected]> Co-authored-by: Rohith Nadimpally <[email protected]> Co-authored-by: rohithn1 <[email protected]> Co-authored-by: Victor Zhu <[email protected]> Co-authored-by: Gary Wang <[email protected]> Co-authored-by: SSRraymond <[email protected]> Co-authored-by: jbarz1 <[email protected]> Co-authored-by: Mohan Gandhi <[email protected]> Co-authored-by: Mohan Gandhi <[email protected]> Co-authored-by: Barboza <[email protected]> Co-authored-by: ruiliann666 <[email protected]> Co-authored-by: Rohan Gujarathi <[email protected]> Co-authored-by: svia3 <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Edward Sun <[email protected]> Co-authored-by: Stephen Via <[email protected]> Co-authored-by: Namrata Madan <[email protected]> Co-authored-by: Stacia Choe <[email protected]> Co-authored-by: Edward Sun <[email protected]> Co-authored-by: Edward Sun <[email protected]> Co-authored-by: Rohan Gujarathi <[email protected]> Co-authored-by: JohnaAtAWS <[email protected]> Co-authored-by: Vera Yu <[email protected]> Co-authored-by: bhaoz <[email protected]> Co-authored-by: Qing Lan <[email protected]> Co-authored-by: Namrata Madan <[email protected]> Co-authored-by: Sirut Buasai <[email protected]> Co-authored-by: wayneyao <[email protected]> Co-authored-by: Jacky Lee <[email protected]> Co-authored-by: haNa-meister <[email protected]> Co-authored-by: Shailav <[email protected]> Fix unit tests (#1018) Fix happy hf test (#1026) fix logic setup (#1034) fixes (#1045) Fix flake error in init (#1050) fix (#1053) fix: skip tensorflow local mode notebook test (#4060) fix: tags for jumpstart model package models (#4061) fix: pipeline variable kms key (#4065) fix: jumpstart cache using sagemaker session s3 client (#4051) fix: gated models unsupported region (#4069) fix: pipeline upsert failed to pass parallelism_config to update (#4066) fix: temporarily skip kmeans notebook (#4092) fixes (#1051) Fix missing absolute import error (#1057) Fix flake8 error in unit test (#1058) fixes (#1056) Fix flake8 error in integ test (#1060) Fix black format error in test_pickle_dependencies (#1062) Fix docstyle error under serve (#1065) Fix docstyle error in builder failure (#1066) fix black and flake8 formatting (#1069) Fix format error (#1070) Fix integ test (#1074) fix: HuggingFaceProcessor parameterized instance_type when image_uri is absent (#4072) fix: log message when sdk defaults not applied (#4104) fix: handle bad jumpstart default session (#4109) Fix the version information, whl and flake8 (#1085) Fix JSON serializer error (#1088) Fix unit test (#1091) fix format (#1103) Fix local mode predictor (#1107) Fix DJLPredictor (#1108) Fix modelbuilder unit tests (#1118) fixes (#1136) fixes (#1165) fixes (#1166) fix: auto ml integ tests and add flaky test markers (#4136) fix model data for JumpStartModel (#4135) fix: transform step unit test (#4151) fix: Update pipeline.py and selective_execution_config.py with small fixes (#1099) fix: Fixed bug in _create_training_details (#4141) fix: use correct line endings and s3 uris on windows (#4118) fix: js tagging s3 prefix (#4167) fix: Update Ec2 instance type to g5.4xlarge in test_huggingface_torch_distributed.py (#4181) fix: import error in unsupported js regions (#4188) fix: update local mode schema (#4185) fix: fix flaky Inference Recommender integration tests (#4156) fix: clone distribution in validate_distribution (#4205) Fix hyperlinks in feature_processor.scheduler parameter descriptions (#4208) Fix master merge formatting (#1186) Fix master unit tests (#1203) Fix djl unit tests (#1204) Fix merge conflicts (#1217) fix: fix URL links (#4217) fix: bump urllib3 version (#4223) fix: relax upper bound on urllib in local mode requirements (#4219) fixes (#1224) fix formatting (#1233) fix byoc unit tests (#1235) fix byoc unit tests (#1236) Fixed Modelpackage's deploy calling model's deploy (#1155) fix: jumpstart unit-test (#1265) fixes (#963) Fix TorchTensorSer/Deser (#969) fix (#971) fix local container mode (#972) Fix auto detect (#979) Fix routing fn (#981) fix local container serialization (#989) fix custom serialiazation with local container. Also remove a lot of unused code (#994) Fix custom serialization for local container mode (#1000) fix pytorch version (#1001) Fix unit test (#990) fix: Multiple bug fixes including removing unsupported feature. (#1105) Fix some problems with pipeline compilation (#1125) fix: Refactor JsonGet s3 URI and add serialize_output_to_json flag (#1164) fix: invoke_function circular import (#1262) fix: pylint (#1264) fix: Add logging for docker build failures (#1267) Fix session bug when provided in ModelBuilder (#1288) fixes (#1313) fix: Gated content bucket env var override (#1280) fix: Change the library used in pytorch test causing cloudpickle version conflict (#1287) fix: HMAC signing for ModelBuilder Triton python backend (#1282) fix: do not delete temp folder generated by sdist (#1291) fix: Do not require model_server if provided image_uri is a 1p image. (#1303) fix: check image type vs instance type (#1307) fix: unit test (#1315) fix: Fixed model builder's register unable to deploy (#1323) fix: missing `self._framework` in `InferenceSpec` path (#1325) fix: enable xgboost integ test in our own pipeline (#1326) fix: skip py310 (#1328) fix: Update autodetect dlc logic (#1329) Fix secret key in the Model object (#1334) fix: improve error message (#1333) Fix unit testing (#1340) fix: Typing and formatting (#1341) fix: WaiterError on failed pipeline execution. results() (#1337) Fix tox identified errors (#1344) Fix issue when the user runs in Python 3.11 (#1345) fixes (#1346) fix: use copy instead of move in bootstrap script (#1339) Resolve keynote3 conflicts (#1351) Resolve keynote3 conflicts v2 (#1353) Fix conflicts (#1354) Fix conflicts v3 (#1355) fix: get whl from local to run integ tests (#1357) fix: enable triton pt tests (#1358) fix: integ test (#1362) Fix Python 3.11 issue with dataclass decorator (#1345) fix: remote function include_local_workdir default value (#1342) fix: error message (#1373) fixes (#1372) fix: Remvoe PickleSerializer (#1378)
Reference: 0411850995
System Information
Describe the problem
I am using OpenNMT-tf for training. When the training reaches the export step, the exported model is not exported properly, and a strange error named "Internal server error" appears.
Minimal repro / logs
Unfortunately, there is nothing in the logs telling anything about the issue!
The shell script contains
Please advise,
The text was updated successfully, but these errors were encountered: