Sagemaker test docs update for framework upgrade (#11206)

* increased train_runtime for model parallelism

* added documentation for framework upgrade
This commit is contained in:
Philipp Schmid 2021-04-13 01:08:33 +02:00 committed by GitHub
parent 74d7c24d8d
commit f243a5ec0d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 4 additions and 6 deletions

View File

@ -66,8 +66,7 @@ images:
```
2. In the PR comment describe what test, we ran and with which package versions. Here you can copy the table from [Current Tests](#current-tests).
TODO: Add a screenshot of PR + Text template to make it easy to open.
2. In the PR comment describe what test we ran and with which framework versions. Here you can copy the table from [Current Tests](#current-tests). You can take a look at this [PR](https://github.com/aws/deep-learning-containers/pull/1016), which information are needed.
## Test Case 2: Releasing a New AWS Framework DLC
@ -92,7 +91,6 @@ AWS_PROFILE=<enter-your-profile> make test-sagemaker
```
These tests take around 10-15 minutes to finish. Preferably make a screenshot of the successfully ran tests.
### After successful Tests:
After we have successfully run tests for the new framework version we need to create a PR at the [Deep Learning Container Repository](https://github.com/aws/deep-learning-containers).
@ -136,7 +134,7 @@ images:
docker_file: !join [ docker/, *SHORT_VERSION, /, *DOCKER_PYTHON_VERSION, /,
*CUDA_VERSION, /Dockerfile., *DEVICE_TYPE ]
```
2. In the PR comment describe what test we ran and with which framework versions. Here you can copy the table from [Current Tests](#current-tests). You can take a look at this [PR](https://github.com/aws/deep-learning-containers/pull/1016), which information are needed.
2. In the PR comment describe what test we ran and with which framework versions. Here you can copy the table from [Current Tests](#current-tests). You can take a look at this [PR](https://github.com/aws/deep-learning-containers/pull/1025), which information are needed.
## Current Tests

View File

@ -28,14 +28,14 @@ if is_sagemaker_available():
"script": "run_glue_model_parallelism.py",
"model_name_or_path": "roberta-large",
"instance_type": "ml.p3dn.24xlarge",
"results": {"train_runtime": 1500, "eval_accuracy": 0.3, "eval_loss": 1.2},
"results": {"train_runtime": 1600, "eval_accuracy": 0.3, "eval_loss": 1.2},
},
{
"framework": "pytorch",
"script": "run_glue.py",
"model_name_or_path": "roberta-large",
"instance_type": "ml.p3dn.24xlarge",
"results": {"train_runtime": 1500, "eval_accuracy": 0.3, "eval_loss": 1.2},
"results": {"train_runtime": 1600, "eval_accuracy": 0.3, "eval_loss": 1.2},
},
]
)