Skip to content

Pre-validate asset media_type before xr.open_dataset to fail clearly on non-Zarr formats #68

@ghostiee-11

Description

@ghostiee-11

What happens

Opening a STAC Item whose assets are not Zarr / NetCDF / HDF5 / kerchunk-ref (e.g. a Cloud-Optimized GeoTIFF) crashes inside xarray's backend with an opaque AttributeError:

import pystac_client
import xarray as xr
import xpystac  # registers the 'stac' engine

client = pystac_client.Client.open('https://earth-search.aws.element84.com/v1')
item = next(client.search(collections=['naip'], max_items=1).items())

xr.open_dataset(item, engine='stac', chunks=None)
# AttributeError: 'NoneType' object has no attribute 'variables'

The NAIP item exposes:

  • image: image/tiff; application=geotiff; profile=cloud-optimized (COG)
  • metadata: application/xml

xpystac picks one, hands it to xarray, the COG backend returns None, and xarray's _protect_dataset_variables_inplace then dereferences dataset.variables on None.

Expected

xpystac should pre-validate the asset's media_type and fail with a clean message, e.g.:

ValueError: Asset 'image' has media_type 'image/tiff; profile=cloud-optimized' which xpystac cannot open; use stackstac or odc-stac for COG-only collections.

Why it matters

This makes most of Earth Search (Landsat, Sentinel, NAIP) unusable with xpystac without users digging through xarray's traceback. We worked around it in holoviz/lumen#1867 by tightening our own predicate (media_type is now authoritative; STAC role hints fall back), but the friendly error belongs in xpystac.

Suggested fix

In xpystac.core where the asset is selected, compare its media_type against the set xpystac actually dispatches on (Zarr, NetCDF, HDF5, kerchunk JSON ref). If it's not openable, raise a ValueError naming the observed media_type and pointing at stackstac / odc-stac for COG-only cases.

Environment

  • xpystac 0.5.0
  • xarray (latest)
  • Python 3.13

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions