Aligns raw sequencing reads against the GRCh38 human reference genome. Produces a sorted, indexed BAM file.
Alignment maps each 150bp sequencing read to its position in the human genome. Required for all downstream variant calling.
- minimap2 — fast aligner (preferred for WGS)
- samtools — sort + index the alignment
quay.io/biocontainers/minimap2:2.28--he4a0461_0(minimap2 aligner)staphb/samtools:1.20(samtools sort + index)
- GRCh38 reference genome (
Homo_sapiens_assembly38.fasta) - minimap2 index (
.mmifile, ~7GB, generated once) - Paired-end FASTQ files
SAMPLE=your_sample
GENOME_DIR=/path/to/your/data
REF=${GENOME_DIR}/reference/Homo_sapiens_assembly38.fasta
# Step 1: Create minimap2 index (one-time, ~30 min)
minimap2 -d ${GENOME_DIR}/reference/GRCh38.mmi $REF
# Step 2: Align + sort (1-2 hours for 30X WGS)
minimap2 -a -x sr -t 16 \
${GENOME_DIR}/reference/GRCh38.mmi \
${GENOME_DIR}/${SAMPLE}/fastq/${SAMPLE}_R1.fastq.gz \
${GENOME_DIR}/${SAMPLE}/fastq/${SAMPLE}_R2.fastq.gz \
| samtools sort -@ 8 -o ${GENOME_DIR}/${SAMPLE}/aligned/${SAMPLE}_sorted.bam
# Step 3: Index BAM
samtools index ${GENOME_DIR}/${SAMPLE}/aligned/${SAMPLE}_sorted.bam
# Output: ~30-40GB BAM + ~9MB BAI index- CPU: 16+ cores recommended
- RAM: 16GB+ (minimap2 loads full index into memory)
- Disk: ~30-40GB per sample (BAM file)
- Time: 1-2 hours for 30X WGS
- Use
-x srfor Illumina short reads (short-read preset) - Alternative:
bwa-mem2is equally valid but minimap2 is faster (seescripts/02a-alignment-bwamem2.sh) - The BAM index (
.bai) must always accompany the BAM file