Skip to content

Add OpenStreetMap SD-map prior (SDTagNet) as a vector map branch#89

Draft
immel-f wants to merge 1 commit into
autowarefoundation:mainfrom
immel-f:sdtagnet_sd_map_encoder
Draft

Add OpenStreetMap SD-map prior (SDTagNet) as a vector map branch#89
immel-f wants to merge 1 commit into
autowarefoundation:mainfrom
immel-f:sdtagnet_sd_map_encoder

Conversation

@immel-f

@immel-f immel-f commented Jun 24, 2026

Copy link
Copy Markdown

Hi, as mentioned in this issue #47, I tried to integrate the SDTagNet SD map encoder as a vector map branch. It would be awesome if vector map branches could make it into the model somehow! This is a early draft and you are very welcome to propose or implement any changes 😄 I tested that the tests still pass as before (3 fail even before) and there is a test visualization of the parsed OSM map data (from a random position, thats why it is not on a road):

osm_map_vis

I tried to make as little changes as possible to the original encoder, but to have one cache of the maps that works for both branches some changes had to be made there and now the raw OSM xml maps are cached and converted when loading for the original branch. I did not test myself if the original branch still works with these changes, I didn't find any visualization where I could test that easily, if I missed some script for that I could also try that quickly.

Changes

Model (Model/model_components/map_encoder/)

  • osm_vector/osm_map_encoder.pyOSMVectorMapEncoder, ported from
    OSMMapEncoderPointLevel, stripped of mmdetection3d/mmcv and ablation-only
    code (kept only the canonical config: point-level tokens, NLP tag embeddings,
    ORF graph identifiers with fixed order, sine continuous positional encoding).
    Returns (tokens [B,N,C], key_padding_mask [B,N]). NLP model is loaded lazily
    from a local path or injected (for tests); constructed on CPU so the module is
    single-device until .to().
  • map_bev_fusion/osm_cross_attn_fusion.pyOSMCrossAttnFusion: BEV cells
    cross-attend to OSM tokens via F.scaled_dot_product_attention (mem-efficient
    backend, so the full 450×300 grid never materialises a QK matrix), honours the
    padding mask, has a learnable null token (no NaN when a sample has no OSM), and
    a zero-init per-channel gate (training starts identical to no-map).
  • Registered osm_vector / osm_cross_attn; wired into AutoE2E behind
    map_type (default path byte-identical; new map_encoder_kwargs passthrough).

Data pipeline (Model/data_parsing/osm_sd_map/)

  • osm_parser.py — OSM XML parser + ego-centric patch extraction (ported;
    city-coord conversion replaced by generic wgs84_to_local.py, ego frame
    X=forward/Y=left/Z=up).
  • overpass.py — the exact tested SDTagNet Overpass query; trajectory-bbox
    fetch with a shared gzip cache and containment reuse (a request is served
    by any cached XML whose bbox covers it).
  • osm_tokenize.py — fixed-point way resampling + tag tokenisation → the
    ragged osm_map_data dict.
  • cache.py — per-episode tokenised .pt shards; one Overpass fetch per
    episode sized to the trajectory.
  • collate.pycollate_osm_batch for ragged OSM fields (standard keys still
    default-collated).
  • nlp_download.py — downloads + extracts the trained tag encoder from
    immel-f/SDTagNet (nlp_encoder/bert-144-osm-tags-embed-from_scratch.tar.gz).
  • visualize.py — SDTagNet-style SD-map figure + a real-data validation CLI
    (python -m data_parsing.osm_sd_map.visualize ... --run-encoder).
  • L2DDataset — flag-gated osm_cache_dir merges per-frame OSM into samples,
    plus episode_ego_poses() for building the cache.

Shared cache (both branches)

Both the rendered-tile and vector branches now fetch by trajectory bounding
box + margin
instead of a fixed centroid radius — this fixes a latent bug
where episodes longer than ~1.5 km silently lost OSM at the trajectory ends.
The renderer (map_rendering/cache.py) builds its graph from the same shared
raw-OSM XML via ox.graph_from_xml. The vector branch's fetch margin defaults
to the renderer's render radius (DEFAULT_FETCH_MARGIN_M=800) so they share one
download; pass fetch_margin_m=patch_reach_m(pc_range) for a minimal
vector-only fetch.

Dependencies

Added to requirements.txt (NLP path is required, not optional):
sentence-transformers, transformers, shapely, pyproj, requests,
huggingface_hub, matplotlib.

Misc

  • .gitignore: checkpoints/, cache/, *.osm.gz, osm_bbox_*.xml.gz,
    graph_*.pkl, osm_map_vis.png.

Testing

  • New offline suites (tests/test_osm_vector.py, tests/test_osm_sd_map_data.py):
    encoder shapes/mask/relations/grad, fusion (zero-gate / null-token / masking),
    registries, collate, AutoE2E osm_vector end-to-end, WGS84→ego transform,
    exact Overpass query string, parser+tokenise, trace-bbox + containment-reuse
    cache, and the visualizer (realistic scene). Heavy deps guarded with
    importorskip; a fake NLP module keeps the model tests checkpoint-free.
  • Full suite passes except 3 pre-existing TestBatchIndependence cases that
    fail only on CUDA (batch-size-dependent reduction nondeterminism in the
    default rasterized path; unrelated to this change — they pass on CPU).

Validating on real data

cd Model
python -m data_parsing.osm_sd_map.visualize --lat 48.9930 --lon 8.4037 --heading 0.0  --download-nlp --run-encoder --out osm_map_vis.png

Fetches real OSM, runs the full pipeline, renders the SD map, and runs the
encoder forward — for eyeballing geometry/ego-framing/tag decoding.

Notes / follow-ups

  • Confirm the L2D heading convention (vehicle[1]) against the visualization.
  • Confirm ox.graph_from_xml filtering parity with network_type="drive" on
    your osmnx version (cosmetic for the tile).
  • Architecture diagram in Model/README.md still predates this branch.
  • Possible follow-ups: a --from-l2d EPISODE FRAME flag for the visualizer;
    token-level fusion variants; NLP-encoder pretraining scripts.

Signed-off-by: Fabian Immel <fabian.immel@kit.edu>
@immel-f immel-f force-pushed the sdtagnet_sd_map_encoder branch from e5f2512 to e67b2a1 Compare June 24, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant