Skip to content

Sentieon/sentieon-amazon-omics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentieon-Amazon-Omics

Sentieon pipelines for AWS HealthOmics

Introduction

Sentieon supports bioinformatic workflows running on AWS HealthOmics. The files in this repository can be used to run Sentieon pipelines as private workflows on AWS HealthOmics or you can use this repository as a starting point for developing customized pipelines that utilize the Sentieon software.

The Sentieon software is a commercial software package and a license is required to run the software. Users can operate a Sentieon license server inside their Amazon VPC following the instructions in Sentieon's AWS Deployment Guide. Once a Sentieon license server is running in your VPC, you can use VPC networking to connect to the Sentieon license server in your HealthOmics workflows.

Running Sentieon pipelines as private workflows

Requirements

  • Docker cli or another container implementation (Podman, etc.)
  • AWS CLI v2

Step 1: Start a Sentieon license server inside your AWS VPC

Please refer to the AWS Deployment Guide.

Step 2: build the Sentieon container image

The following files are in the container directory:

  • sentieon_omics.dockerfile: A dockerfile that can be used to create a Sentieon container image for AWS HealthOmics

To build the container image for the latest version of Sentieon, run:

cd ./container
docker build --platform linux/amd64 --build-arg SENTIEON_VERSION=202503.03 -t sentieon:omics-1 -f sentieon_omics.dockerfile .

Step 3: push the container image to an Amazon ECR private repository

Create a private repository in AWS ECR

aws ecr create-repository --repository-name sentieon

Login to the registry

aws ecr get-login-password --region <region-name> | docker login --username AWS --password-stdin <account-id>.dkr.ecr.<region-name>.amazonaws.com

Tag the custom Sentieon container and push the container image to the repository

docker tag sentieon:omics-1 <account-id>.dkr.ecr.<region-name>.amazonaws.com/sentieon:omics-1
docker push <account-id>.dkr.ecr.<region-name>.amazonaws.com/sentieon:omics-1

Grant the HealthOmics service permission to interact with the repository using the policy in the assets directory

aws ecr set-repository-policy --repository-name sentieon --policy-text file://assets/omics-ecr-repository-policy.json

Step 4: create a security group to enable your workflow to connect to the Sentieon license server and other AWS resources

Please see the documentation on VPC-connected workflow to get started. The Sentieon software will need to connect to the Sentieon license server using TCP on the specified port.

Step 5: create a HealthOmics VPC configuration

Create a VPC configuration for your HealthOmics workflow:

aws omics create-configuration \
  --name <configuration_name> \
  --run-configurations '{
    "vpcConfig": {
      "securityGroupIds": <security_groups>,
      "subnetIds": <subnet_ids>
    }
  }' \
  --region <region>

Step 6: create an example workflow on AWS HealthOmics

We are now ready to create Sentieon workflows on AWS HealthOmics. Running the following command at the start of the workflow will configure the environment for the Sentieon software:

export SENTIEON_LICENSE=<SENTIEON_LICENSE>

Where <SENTIEON_LICENSE> is the IP address or FQDN and port of your Sentieon license server.

Example workflows can be found in the examples directory and complete workflow implementations can be found in the workflows directory.

WDL

(cd examples/wdl && zip test_sentieon.wdl.zip test_sentieon.wdl)

aws omics create-workflow \
    --name test-sentieon-wdl \
    --engine WDL \
    --definition-zip fileb://examples/wdl/test_sentieon.wdl.zip \
    --parameter-template file://examples/parameters.template.json

Nextflow

(cd examples/nextflow && zip -r ${OLDPWD}/test_sentieon.nextflow.zip .)

aws omics create-workflow \
    --name test-sentieon-nextflow \
    --engine NEXTFLOW \
    --main test_sentieon.nf \
    --definition-zip fileb://test_sentieon.nextflow.zip \
    --parameter-template file://examples/parameters.template.json

The create-workflow command will output some information including the workflow-id.

Step 7: run the example workflow

To run the example workflow, modify the examples/test.parameters.json file replacing <sentieon-license>, <account-id>, and <region-name> to match your environment. Then run the following, using the workflow-id from the create-workflow command and the role-name for your AWS HealthOmics service role:

aws omics start-run \
    --role-arn "arn:aws:iam::<account-id>:role/<role-name>" \
    --workflow-id <workflow_id> \
    --name "test $(date +%Y%m%d-%H%M%S)" \
    --output-uri <s3-uri> \
    --parameters file://examples/test.parameters.json \
    --networking-mode VPC \
    --configuration-name <configuration_name>

After ~20min, verify that the test workflow completes successfully:

aws omics get-run --id <run-id>

You should see a response like:

{
    "arn": "arn:aws:omics:<region>:<account-id>:run/<run-id>",
    "creationTime": "2023-04-24T17:14:38.880864+00:00",
    "digest": "sha256:<sha256>",
    "id": "<run-id>",
    "name": "test 20230424-101437",
    "outputUri": "<s3-output-uri>",
    "parameters": {
        "sentieon_license": "<sentieon_license>",
        "sentieon_docker": "<account-id>.dkr.ecr.<region>.amazonaws.com/sentieon:omics"
    },
    "resourceDigests": {
        "<account-id>.dkr.ecr.<region>.amazonaws.com/sentieon:omics": "sha256:<sha256>"
    },
    "roleArn": "arn:aws:iam::<account-id>:role/<role-name>",
    "startTime": "2023-04-24T17:25:06.021000+00:00",
    "startedBy": "arn:aws:iam::<account-id>:<user>",
    "status": "COMPLETED",
    "stopTime": "2023-04-24T17:39:32.138095+00:00",
    "tags": {},
    "workflowId": "<workflow-id>",
    "workflowType": "PRIVATE"
}

The example workflow has only one task called SentieonLicense. Locate this task:

aws omics list-run-tasks --id <run-id>

You should see a response like:

{
    "items": [
        {
            "cpus": 1,
            "creationTime": "2023-04-24T17:25:40.323030+00:00",
            "memory": 4,
            "name": "SentieonLicence",
            "startTime": "2023-04-24T17:29:42.881000+00:00",
            "status": "COMPLETED",
            "stopTime": "2023-04-24T17:30:24.442000+00:00",
            "taskId": "<task-id>"
        }
    ]
}

Get the log-stream for the task:

aws logs get-log-events --log-group-name /aws/omics/WorkflowLog --log-stream-name run/<run-id>/task/<task-id> --output text

Note that get-log-events is paginated, and may not return the full log stream for workflows with verbose logs

If license verification is successful, you should see event lines like:

EVENTS  1682357406252   sentieon licclnt ping && echo "Ping is OK"      1682357399707
EVENTS  1682357406252   + sentieon licclnt ping 1682357399707
EVENTS  1682357406252   + echo 'Ping is OK'     1682357400013
EVENTS  1682357406252   sentieon licclnt query Haplotyper       1682357400013
EVENTS  1682357406252   + sentieon licclnt query Haplotyper     1682357400015
EVENTS  1682357406252   Ping is OK      1682357400015
EVENTS  1682357406252   499968  1682357400539

Next steps

Congratulations! You've successfully run a test workflow with the Sentieon software on AWS HealthOmics. Feel free to update/extend the example workflow to implement your own custom pipelines with the Sentieon software.

Alternatively, you can find full pipeline implementations in the workflows directory that you can modify or implement as private workflows.

About

Sentieon pipelines for AWS HealthOmics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors