Skip to content

Latest commit

 

History

History
53 lines (41 loc) · 1.74 KB

File metadata and controls

53 lines (41 loc) · 1.74 KB

Step 2: Alignment (FASTQ to BAM)

What This Does

Aligns raw sequencing reads against the GRCh38 human reference genome. Produces a sorted, indexed BAM file.

Why

Alignment maps each 150bp sequencing read to its position in the human genome. Required for all downstream variant calling.

Tools

  • minimap2 — fast aligner (preferred for WGS)
  • samtools — sort + index the alignment

Docker Images

  • quay.io/biocontainers/minimap2:2.28--he4a0461_0 (minimap2 aligner)
  • staphb/samtools:1.20 (samtools sort + index)

Prerequisites

  • GRCh38 reference genome (Homo_sapiens_assembly38.fasta)
  • minimap2 index (.mmi file, ~7GB, generated once)
  • Paired-end FASTQ files

Commands

SAMPLE=your_sample
GENOME_DIR=/path/to/your/data
REF=${GENOME_DIR}/reference/Homo_sapiens_assembly38.fasta

# Step 1: Create minimap2 index (one-time, ~30 min)
minimap2 -d ${GENOME_DIR}/reference/GRCh38.mmi $REF

# Step 2: Align + sort (1-2 hours for 30X WGS)
minimap2 -a -x sr -t 16 \
  ${GENOME_DIR}/reference/GRCh38.mmi \
  ${GENOME_DIR}/${SAMPLE}/fastq/${SAMPLE}_R1.fastq.gz \
  ${GENOME_DIR}/${SAMPLE}/fastq/${SAMPLE}_R2.fastq.gz \
| samtools sort -@ 8 -o ${GENOME_DIR}/${SAMPLE}/aligned/${SAMPLE}_sorted.bam

# Step 3: Index BAM
samtools index ${GENOME_DIR}/${SAMPLE}/aligned/${SAMPLE}_sorted.bam

# Output: ~30-40GB BAM + ~9MB BAI index

Resource Requirements

  • CPU: 16+ cores recommended
  • RAM: 16GB+ (minimap2 loads full index into memory)
  • Disk: ~30-40GB per sample (BAM file)
  • Time: 1-2 hours for 30X WGS

Notes

  • Use -x sr for Illumina short reads (short-read preset)
  • Alternative: bwa-mem2 is equally valid but minimap2 is faster (see scripts/02a-alignment-bwamem2.sh)
  • The BAM index (.bai) must always accompany the BAM file