Add OpenStreetMap SD-map prior (SDTagNet) as a vector map branch#89
Draft
immel-f wants to merge 1 commit into
Draft
Add OpenStreetMap SD-map prior (SDTagNet) as a vector map branch#89immel-f wants to merge 1 commit into
immel-f wants to merge 1 commit into
Conversation
Signed-off-by: Fabian Immel <fabian.immel@kit.edu>
e5f2512 to
e67b2a1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi, as mentioned in this issue #47, I tried to integrate the SDTagNet SD map encoder as a vector map branch. It would be awesome if vector map branches could make it into the model somehow! This is a early draft and you are very welcome to propose or implement any changes 😄 I tested that the tests still pass as before (3 fail even before) and there is a test visualization of the parsed OSM map data (from a random position, thats why it is not on a road):
I tried to make as little changes as possible to the original encoder, but to have one cache of the maps that works for both branches some changes had to be made there and now the raw OSM xml maps are cached and converted when loading for the original branch. I did not test myself if the original branch still works with these changes, I didn't find any visualization where I could test that easily, if I missed some script for that I could also try that quickly.
Changes
Model (
Model/model_components/map_encoder/)osm_vector/osm_map_encoder.py—OSMVectorMapEncoder, ported fromOSMMapEncoderPointLevel, stripped of mmdetection3d/mmcv and ablation-onlycode (kept only the canonical config: point-level tokens, NLP tag embeddings,
ORF graph identifiers with fixed order, sine continuous positional encoding).
Returns
(tokens [B,N,C], key_padding_mask [B,N]). NLP model is loaded lazilyfrom a local path or injected (for tests); constructed on CPU so the module is
single-device until
.to().map_bev_fusion/osm_cross_attn_fusion.py—OSMCrossAttnFusion: BEV cellscross-attend to OSM tokens via
F.scaled_dot_product_attention(mem-efficientbackend, so the full 450×300 grid never materialises a QK matrix), honours the
padding mask, has a learnable null token (no NaN when a sample has no OSM), and
a zero-init per-channel gate (training starts identical to no-map).
osm_vector/osm_cross_attn; wired intoAutoE2Ebehindmap_type(default path byte-identical; newmap_encoder_kwargspassthrough).Data pipeline (
Model/data_parsing/osm_sd_map/)osm_parser.py— OSM XML parser + ego-centric patch extraction (ported;city-coord conversion replaced by generic
wgs84_to_local.py, ego frameX=forward/Y=left/Z=up).
overpass.py— the exact tested SDTagNet Overpass query; trajectory-bboxfetch with a shared gzip cache and containment reuse (a request is served
by any cached XML whose bbox covers it).
osm_tokenize.py— fixed-point way resampling + tag tokenisation → theragged
osm_map_datadict.cache.py— per-episode tokenised.ptshards; one Overpass fetch perepisode sized to the trajectory.
collate.py—collate_osm_batchfor ragged OSM fields (standard keys stilldefault-collated).
nlp_download.py— downloads + extracts the trained tag encoder fromimmel-f/SDTagNet(nlp_encoder/bert-144-osm-tags-embed-from_scratch.tar.gz).visualize.py— SDTagNet-style SD-map figure + a real-data validation CLI(
python -m data_parsing.osm_sd_map.visualize ... --run-encoder).L2DDataset— flag-gatedosm_cache_dirmerges per-frame OSM into samples,plus
episode_ego_poses()for building the cache.Shared cache (both branches)
Both the rendered-tile and vector branches now fetch by trajectory bounding
box + margin instead of a fixed centroid radius — this fixes a latent bug
where episodes longer than ~1.5 km silently lost OSM at the trajectory ends.
The renderer (
map_rendering/cache.py) builds its graph from the same sharedraw-OSM XML via
ox.graph_from_xml. The vector branch's fetch margin defaultsto the renderer's render radius (
DEFAULT_FETCH_MARGIN_M=800) so they share onedownload; pass
fetch_margin_m=patch_reach_m(pc_range)for a minimalvector-only fetch.
Dependencies
Added to
requirements.txt(NLP path is required, not optional):sentence-transformers,transformers,shapely,pyproj,requests,huggingface_hub,matplotlib.Misc
.gitignore:checkpoints/,cache/,*.osm.gz,osm_bbox_*.xml.gz,graph_*.pkl,osm_map_vis.png.Testing
tests/test_osm_vector.py,tests/test_osm_sd_map_data.py):encoder shapes/mask/relations/grad, fusion (zero-gate / null-token / masking),
registries, collate, AutoE2E
osm_vectorend-to-end, WGS84→ego transform,exact Overpass query string, parser+tokenise, trace-bbox + containment-reuse
cache, and the visualizer (realistic scene). Heavy deps guarded with
importorskip; a fake NLP module keeps the model tests checkpoint-free.TestBatchIndependencecases thatfail only on CUDA (batch-size-dependent reduction nondeterminism in the
default rasterized path; unrelated to this change — they pass on CPU).
Validating on real data
Fetches real OSM, runs the full pipeline, renders the SD map, and runs the
encoder forward — for eyeballing geometry/ego-framing/tag decoding.
Notes / follow-ups
vehicle[1]) against the visualization.ox.graph_from_xmlfiltering parity withnetwork_type="drive"onyour osmnx version (cosmetic for the tile).
Model/README.mdstill predates this branch.--from-l2d EPISODE FRAMEflag for the visualizer;token-level fusion variants; NLP-encoder pretraining scripts.