Sentieon pipelines for AWS HealthOmics
Sentieon supports bioinformatic workflows running on AWS HealthOmics. The files in this repository can be used to run Sentieon pipelines as private workflows on AWS HealthOmics or you can use this repository as a starting point for developing customized pipelines that utilize the Sentieon software.
The Sentieon software is a commercial software package and a license is required to run the software. Users can operate a Sentieon license server inside their Amazon VPC following the instructions in Sentieon's AWS Deployment Guide. Once a Sentieon license server is running in your VPC, you can use VPC networking to connect to the Sentieon license server in your HealthOmics workflows.
- Docker cli or another container implementation (Podman, etc.)
- AWS CLI v2
Please refer to the AWS Deployment Guide.
The following files are in the container directory:
sentieon_omics.dockerfile: A dockerfile that can be used to create a Sentieon container image for AWS HealthOmics
To build the container image for the latest version of Sentieon, run:
cd ./container
docker build --platform linux/amd64 --build-arg SENTIEON_VERSION=202503.03 -t sentieon:omics-1 -f sentieon_omics.dockerfile .Create a private repository in AWS ECR
aws ecr create-repository --repository-name sentieonLogin to the registry
aws ecr get-login-password --region <region-name> | docker login --username AWS --password-stdin <account-id>.dkr.ecr.<region-name>.amazonaws.comTag the custom Sentieon container and push the container image to the repository
docker tag sentieon:omics-1 <account-id>.dkr.ecr.<region-name>.amazonaws.com/sentieon:omics-1
docker push <account-id>.dkr.ecr.<region-name>.amazonaws.com/sentieon:omics-1Grant the HealthOmics service permission to interact with the repository using the policy in the assets directory
aws ecr set-repository-policy --repository-name sentieon --policy-text file://assets/omics-ecr-repository-policy.jsonStep 4: create a security group to enable your workflow to connect to the Sentieon license server and other AWS resources
Please see the documentation on VPC-connected workflow to get started. The Sentieon software will need to connect to the Sentieon license server using TCP on the specified port.
Create a VPC configuration for your HealthOmics workflow:
aws omics create-configuration \
--name <configuration_name> \
--run-configurations '{
"vpcConfig": {
"securityGroupIds": <security_groups>,
"subnetIds": <subnet_ids>
}
}' \
--region <region>We are now ready to create Sentieon workflows on AWS HealthOmics. Running the following command at the start of the workflow will configure the environment for the Sentieon software:
export SENTIEON_LICENSE=<SENTIEON_LICENSE>Where <SENTIEON_LICENSE> is the IP address or FQDN and port of your Sentieon license server.
Example workflows can be found in the examples directory and complete workflow implementations can be found in the workflows directory.
(cd examples/wdl && zip test_sentieon.wdl.zip test_sentieon.wdl)
aws omics create-workflow \
--name test-sentieon-wdl \
--engine WDL \
--definition-zip fileb://examples/wdl/test_sentieon.wdl.zip \
--parameter-template file://examples/parameters.template.json(cd examples/nextflow && zip -r ${OLDPWD}/test_sentieon.nextflow.zip .)
aws omics create-workflow \
--name test-sentieon-nextflow \
--engine NEXTFLOW \
--main test_sentieon.nf \
--definition-zip fileb://test_sentieon.nextflow.zip \
--parameter-template file://examples/parameters.template.jsonThe create-workflow command will output some information including the workflow-id.
To run the example workflow, modify the examples/test.parameters.json file replacing <sentieon-license>, <account-id>, and <region-name> to match your environment. Then run the following, using the workflow-id from the create-workflow command and the role-name for your AWS HealthOmics service role:
aws omics start-run \
--role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
--workflow-id <workflow_id> \
--name "test $(date +%Y%m%d-%H%M%S)" \
--output-uri <s3-uri> \
--parameters file://examples/test.parameters.json \
--networking-mode VPC \
--configuration-name <configuration_name>After ~20min, verify that the test workflow completes successfully:
aws omics get-run --id <run-id>You should see a response like:
{
"arn": "arn:aws:omics:<region>:<account-id>:run/<run-id>",
"creationTime": "2023-04-24T17:14:38.880864+00:00",
"digest": "sha256:<sha256>",
"id": "<run-id>",
"name": "test 20230424-101437",
"outputUri": "<s3-output-uri>",
"parameters": {
"sentieon_license": "<sentieon_license>",
"sentieon_docker": "<account-id>.dkr.ecr.<region>.amazonaws.com/sentieon:omics"
},
"resourceDigests": {
"<account-id>.dkr.ecr.<region>.amazonaws.com/sentieon:omics": "sha256:<sha256>"
},
"roleArn": "arn:aws:iam::<account-id>:role/<role-name>",
"startTime": "2023-04-24T17:25:06.021000+00:00",
"startedBy": "arn:aws:iam::<account-id>:<user>",
"status": "COMPLETED",
"stopTime": "2023-04-24T17:39:32.138095+00:00",
"tags": {},
"workflowId": "<workflow-id>",
"workflowType": "PRIVATE"
}The example workflow has only one task called SentieonLicense. Locate this task:
aws omics list-run-tasks --id <run-id>You should see a response like:
{
"items": [
{
"cpus": 1,
"creationTime": "2023-04-24T17:25:40.323030+00:00",
"memory": 4,
"name": "SentieonLicence",
"startTime": "2023-04-24T17:29:42.881000+00:00",
"status": "COMPLETED",
"stopTime": "2023-04-24T17:30:24.442000+00:00",
"taskId": "<task-id>"
}
]
}Get the log-stream for the task:
aws logs get-log-events --log-group-name /aws/omics/WorkflowLog --log-stream-name run/<run-id>/task/<task-id> --output textNote that get-log-events is paginated, and may not return the full log stream for workflows with verbose logs
If license verification is successful, you should see event lines like:
EVENTS 1682357406252 sentieon licclnt ping && echo "Ping is OK" 1682357399707
EVENTS 1682357406252 + sentieon licclnt ping 1682357399707
EVENTS 1682357406252 + echo 'Ping is OK' 1682357400013
EVENTS 1682357406252 sentieon licclnt query Haplotyper 1682357400013
EVENTS 1682357406252 + sentieon licclnt query Haplotyper 1682357400015
EVENTS 1682357406252 Ping is OK 1682357400015
EVENTS 1682357406252 499968 1682357400539
Congratulations! You've successfully run a test workflow with the Sentieon software on AWS HealthOmics. Feel free to update/extend the example workflow to implement your own custom pipelines with the Sentieon software.
Alternatively, you can find full pipeline implementations in the workflows directory that you can modify or implement as private workflows.