Skip to content

turingmotors/hrf-with-cosmos-reason2

Repository files navigation

Hierarchical Reasoning Framework for Dashcam Incident Analysis with Cosmos-Reason2-8B

NVIDIA

A Physical AI system that automatically analyzes dashcam videos — detecting incidents, classifying severity, and explaining causes — through a multi-stage hierarchical reasoning pipeline. All inference is performed by a single model (Cosmos-Reason2-8B) with no fine-tuning.

Features & Details.

Reference (original approach): https://arxiv.org/abs/2510.12190

  • What this repo does: MP4 → frames → captions → incident frame detection → 3-stage reasoning → CSV
  • What you need: Cosmos-Reason2-8B weights + Kaggle 2COOOL dataset

Quick Start

Option A: Web App (single video analysis)

Analyze a single dashcam video via browser interface. See 100_app/README.md for details.

Option B: Batch Pipeline (dataset-scale processing)

  1. Install FFmpeg + uv
  2. Run vllm_cosmos_reason2/setup.sh
  3. Place model weights under /data/models/nvidia/Cosmos-Reason2-8B
  4. Download dataset
  5. Run stages in order: 001 → server → 002 → 003 → 004
  6. Compare submissions with 005_2coool-studio

Requirements

Category Details
OS Ubuntu 22.04.4 LTS
GPU (Recommended) NVIDIA H100 80GB HBM3 × 8
CUDA / Driver CUDA 12.8 / NVIDIA driver 535.x
Key dependencies vLLM (Cosmos-Reason2-8B), FFmpeg (ffmpeg, ffprobe)

Note: vllm_cosmos_reason2 provisions its own virtual environment via uv.

Setup

1) Install prerequisites

# FFmpeg (Ubuntu)
sudo apt-get update && sudo apt-get install -y ffmpeg
# uv (Ubuntu)
curl -LsSf https://astral.sh/uv/install.sh | sh

2) Create the project environment

cd vllm_cosmos_reason2
bash setup.sh

3) Download and place model weights

Default model path expected by scripts:

Cosmos-Reason2-8B: /data/models/nvidia/Cosmos-Reason2-8B

Place model weights at the above path, or update run_server*.sh to point to your local path (or create a symlink under /data/models).

4) Download the dataset (2COOOL)

Download from the official Kaggle page: https://www.kaggle.com/competitions/2coool/data

Running the Pipeline

0) Activate environment

source ./vllm_cosmos_reason2/.venv/bin/activate

1) Video → Frames

Convert videos into frames. If you have gaze heatmaps, also prepare vertically stacked MP4s and their frame PNGs.

cd 001_video2frames

# MP4 → PNG (dashcam videos and/or heatmaps depending on --input-dirs)
python ./src/mp4_to_png.py \
    --input-dirs <Competition Data> \
    --output-root gdrive_png

# vertically stack heatmaps + videos into a single MP4 per video_id
python ./src/vstack_mp4_pairs_ffprobe.py \
    --left-dir /data/dataset/2coool/gdrive/heatmaps/ \
    --right-dir /data/dataset/2coool/gdrive/videos/ \
    --out-dir mp4_vstack

# MP4(vstack) → PNG
python ./src/mp4_to_png.py \
    --input-dirs mp4_vstack \
    --output-root mp4_vstack_png

Expected directory layout:

001_video2frames/
|-gdrive_png/
  |-videos/<video_id>/000001.png ...
  |-heatmaps/<video_id>/000001.png ...
|-mp4_vstack/<video_id>.mp4
|-mp4_vstack_png/<video_id>/000001.png ...

2) Start vLLM Server (8 GPUs)

In a separate terminal:

cd ./vllm_cosmos_reason2
bash ./run_server_8gpu.sh   # 8 GPUs (ports 8000-8007)

3) Stage 2: Frame Captioning

Captions are generated every 10 frames.

cd 002_frame_captioning
bash ./run_cosmos_frame_captioning_parallel.sh   # uses 8 GPUs

4) Stage 3: Incident/Hazard Frame Detection

Analyze the generated captions and identify incident or hazard frames.

cd 003_frame_detection
bash ./run_cosmos_frame_detection_parallel.sh   # uses 8 GPUs

5) Stage 4: Incident/Hazard Description (3-stage reasoning)

Runs Count → Text → Reconcile over frames around the detected incident frame.

cd 004_description
bash ./run_cosmos_description_from_csv_parallel.sh   # uses 8 GPUs

6) Stage 5: Blind A/B Test (optional)

Compare multiple submission.csv candidates with: 005_2coool-studio

Output

Final CSV is created at: 004_description/results/run_cosmos_reason2_parallel/submit_filled.csv

Columns:

Column Type Description
video int Video ID
Incident window start frame int Frame number where the incident begins
Incident Detection int (-1, 0, 1) -1 = no incident, 0 = hazard, 1 = accident
Crash Severity string Severity label
Ego-car involved int (0, 1) 0 = not involved, 1 = involved
Label string Incident type label
Number of Bicyclists/Scooters int Count of involved cyclists/scooters
Number of animals involved int Count of involved animals
Number of pedestrians involved int Count of involved pedestrians
Number of vehicles involved (excluding ego-car) int Count of other vehicles involved
Caption Before Incident string Scene description before the incident
Reason of Incident string Cause-and-effect explanation

Adapting to Other Dashcam Datasets

This pipeline can be applied to other dashcam video datasets for incident analysis. The input is MP4 dashcam video files with optional gaze heatmap videos.

Note: The current configuration expects gaze heatmap videos (vertically stacked with dashcam frames) in the Frame Captioning stage. To run without heatmaps, modify the prompt in 002_frame_captioning/configs/default.yaml to remove heatmap-specific instructions and skip the vstack_mp4_pairs_ffprobe.py step.

What you usually change (prompts)

All VLM prompts live in YAML configs:

Stage Config file
Frame Captioning 002_frame_captioning/configs/default.yaml
Frame Detection 003_frame_detection/configs/default.yaml
Incident Description 004_description/configs/default.yaml

Typical workflow:

  1. Copy config: cp default.yaml my_domain.yaml
  2. Edit prompts (incident taxonomy, severity rules, counting rules)
  3. Run with --config configs/my_domain.yaml

When you need code changes (schema / keys)

If you change output schema (column names, constraints, required fields), update:

File What to change
002_frame_captioning/src/cosmos_frame_captioning_vllm.py preferred_keys in export_text_json_to_csv()
004_description/src/cosmos_multi_image_infer_vllm_sc.py JSON_KEYS, COUNT_KEYS, TEXT_KEYS, detect_contradictions()
004_description/src/aggregate_ans_jsons_to_csv.py HEADER, NUMERIC_KEYS
004_description/src/postprocess_fill_nulls.py HEADER, FALLBACK_FIELDS

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors