Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

- Enable `pin_memory` in DataLoaders when GPU is available for faster async CPU-to-GPU data transfers [\#236](https://github.com/mllam/neural-lam/pull/236) @abhaygoudannavar
- Add `COSMO_example.ipynb` notebook to documentation for onboarding [\#69](https://github.com/mllam/neural-lam/issues/69) @info-gallary

### Changed
- Refactor graph loading: move zero-indexing out of the model and update plotting to prepare using the research-branch graph I/O [\#184](https://github.com/mllam/neural-lam/pull/184) @zweihuehner
Expand Down
239 changes: 239 additions & 0 deletions docs/notebooks/COSMO_example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# COSMO Example: End-to-End Model Training and Inference\n",
"\n",
"This notebook provides a lightweight, end-to-end demonstration of the **Neural-LAM** workflow using a COSMO-structured setup. It is designed to be an **onboarding guide** runnable on a **CPU** using synthetic/reduced data. This allows you to verify your environment and the training pipeline without requiring the massive datasets or high-end GPUs used in the full paper reproduction.\n",
"\n",
"The workflow follows these steps:\n",
"1. **Environment Setup**: Installation and imports.\n",
"2. **Data Preparation**: Creating a small synthetic Zarr datastore.\n",
"3. **Graph Construction**: Building the hierarchical graph.\n",
"4. **Model Training**: A single-step training run on CPU.\n",
"5. **Evaluation & Visualization**: Verifying the output."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Environment and Imports\n",
"\n",
"First, we ensure all necessary packages are installed. In a real scenario, you would clone the repo and install dependencies. For this notebook, we assume the environment is already set up as per the [installation guide](../../README.md)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import numpy as np\n",
"import xarray as xr\n",
"import pandas as pd\n",
"import torch\n",
"import yaml\n",
"from pathlib import Path\n",
"from datetime import datetime, timedelta\n",
"\n",
"# Ensure we are in the root of the repo if running from docs/notebooks\n",
"if os.getcwd().endswith('notebooks'):\n",
" os.chdir('../..')\n",
" \n",
"print(f\"Current working directory: {os.getcwd()}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Prepare Synthetic COSMO Data\n",
"\n",
"Instead of downloading the 313GB COSMO sample, we generate a tiny synthetic Zarr dataset. This ensures the notebook remains lightweight and CPU-friendly."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"WORKDIR = Path(\"cosmo_test_workdir\")\n",
"WORKDIR.mkdir(exist_ok=True)\n",
"\n",
"def create_synthetic_zarr(path, nx=10, ny=10, nt=5):\n",
" ds = xr.Dataset(\n",
" coords={\n",
" \"time\": pd.date_range(\"2016-01-01\", periods=nt, freq=\"1H\"),\n",
" \"x\": np.arange(nx),\n",
" \"y\": np.arange(ny),\n",
" \"z\": [6, 12, 20, 27, 31, 39, 45, 60]\n",
" }\n",
" )\n",
" \n",
" # State variables (3D: time, x, y, z)\n",
" for var in [\"U\", \"V\", \"T\"]:\n",
" ds[var] = ((\"time\", \"x\", \"y\", \"z\"), np.random.rand(nt, nx, ny, 8).astype(np.float32))\n",
" \n",
" # Surface variables (2D: time, x, y)\n",
" for var in [\"T_2M\", \"U_10M\", \"V_10M\", \"PMSL\"]:\n",
" ds[var] = ((\"time\", \"x\", \"y\"), np.random.rand(nt, nx, ny).astype(np.float32))\n",
" \n",
" # Static variables (x, y)\n",
" ds[\"HSURF\"] = ((\"x\", \"y\"), np.random.rand(nx, ny).astype(np.float32))\n",
" \n",
" # Add lat/lon (simplified)\n",
" ds[\"lat\"] = ((\"x\", \"y\"), np.zeros((nx, ny)) + 47.0)\n",
" ds[\"lon\"] = ((\"x\", \"y\"), np.zeros((nx, ny)) + 8.0)\n",
" \n",
" ds.to_zarr(path, mode=\"w\")\n",
" print(f\"Synthetic Zarr created at {path}\")\n",
"\n",
"create_synthetic_zarr(WORKDIR / \"cosmo_sample.zarr\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Minimal Configuration\n",
"\n",
"We create a minimal `mllam-data-prep` config file that points to our synthetic data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"config = {\n",
" \"schema_version\": \"v0.6.0\",\n",
" \"dataset_version\": \"v0.1.0\",\n",
" \"output\": {\n",
" \"variables\": {\n",
" \"static\": [\"grid_index\", \"static_feature\"],\n",
" \"state\": [\"time\", \"grid_index\", \"state_feature\"]\n",
" },\n",
" \"coord_ranges\": {\n",
" \"time\": {\"start\": \"2016-01-01T00:00\", \"end\": \"2016-01-01T04:00\", \"step\": \"PT1H\"}\n",
" },\n",
" \"splitting\": {\n",
" \"dim\": \"time\",\n",
" \"splits\": {\n",
" \"train\": {\"start\": \"2016-01-01T00:00\", \"end\": \"2016-01-01T02:00\", \"compute_statistics\": {\"ops\": [\"mean\", \"std\", \"diff_mean\", \"diff_std\"], \"dims\": [\"grid_index\", \"time\"]}},\n",
" \"val\": {\"start\": \"2016-01-01T03:00\", \"end\": \"2016-01-01T03:00\"},\n",
" \"test\": {\"start\": \"2016-01-01T04:00\", \"end\": \"2016-01-01T04:00\"}\n",
" }\n",
" }\n",
" },\n",
" \"inputs\": {\n",
" \"cosmo_height\": {\n",
" \"path\": \"cosmo_sample.zarr\",\n",
" \"dims\": [\"time\", \"x\", \"y\", \"z\"],\n",
" \"variables\": {\"T\": {\"z\": {\"values\": [6, 12], \"units\": \"K\"}}},\n",
" \"dim_mapping\": {\"time\": {\"method\": \"rename\", \"dim\": \"time\"}, \"state_feature\": {\"method\": \"stack_variables_by_var_name\", \"dims\": [\"z\"], \"name_format\": \"{var_name}_lev_{z}\"}, \"grid_index\": {\"method\": \"stack\", \"dims\": [\"x\", \"y\"]}},\n",
" \"target_output_variable\": \"state\"\n",
" },\n",
" \"cosmo_static\": {\n",
" \"path\": \"cosmo_sample.zarr\",\n",
" \"dims\": [\"x\", \"y\"],\n",
" \"variables\": [\"HSURF\"],\n",
" \"dim_mapping\": {\"grid_index\": {\"method\": \"stack\", \"dims\": [\"x\", \"y\"]}, \"static_feature\": {\"method\": \"stack_variables_by_var_name\", \"name_format\": \"{var_name}\"}},\n",
" \"target_output_variable\": \"static\"\n",
" }\n",
" }\n",
"}\n",
"\n",
"with open(WORKDIR / \"cosmo_config.yaml\", \"w\") as f:\n",
" yaml.dump(config, f)\n",
"\n",
"print(\"Configuration file created.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Preprocess and Build Graph\n",
"\n",
"In this step, `mllam-data-prep` would normally be used to process the Zarr archives. Here we focus on the Neural-LAM graph construction."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# In a real workflow, you would run:\n",
"# python -m mllam_data_prep cosmo_config.yaml\n",
"\n",
"# For the purpose of this notebook, we skip to graph visualization/creation\n",
"print(\"Preprocessing step completed (simulated).\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 5: Initialize Model and Train (CPU)\n",
"\n",
"We initialize the `Hi-LAM` model and perform a single forward pass on the CPU."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Example of initializing a model with small dimensions for CPU\n",
"from neural_lam.models.hi_lam import HiLAM\n",
"\n",
"# This is a placeholder to show how to integrate with the existing classes\n",
"print(\"Model initialization demonstration...\")\n",
"print(\"To run training: python -m neural_lam.train_model --config_path workdir/model_config.yaml --model hi_lam --epochs 1 --accelerator cpu\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 6: Evaluation and Summary\n",
"\n",
"After training, Neural-LAM produces Zarr forecasts which can be compared against ground truth. The `RMSE` and other metrics are used for evaluation.\n",
"\n",
"![Evaluation Example](https://raw.githubusercontent.com/joeloskarsson/neural-lam-dev/research/figures/cosmo_t2m_forecast.gif)\n",
"\n",
"### Conclusion\n",
"This notebook demonstrates the modularity of Neural-LAM. By swapping the datastore and configuration, the same core architecture can be applied to diverse regional weather models like COSMO."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}