Add rebuild cache -- missing files removed from index etc. All with @debug logging of individual files found, @warn if not found, @error if cache malformed etc. Rebuild proceeds if kwarg `dr
Implemented:
CacheEntry.ttlfield +DataCache(; default_ttl=)for per-entry and cache-level TTLisstale(cache, entry/label)for stale detection (stale-while-revalidate friendly)invalidate!(cache; stale, pattern, format, before, after, labeled, predicate, dry_run)for bulk invalidationCacheAssets.purge!(cache; max_age, max_idle, keep_count, max_size_bytes, stale, keep_labeled, dry_run, …filters…)for LRU/size-limit purgingset_autopurge!(cache; max_age, max_idle, keep_count, max_size_bytes, keep_labeled)for automatic post-write purgingformatfilter added toCacheAssets.ls/ls!/entries
The implementation writes payload files and rewrites cache_index.toml, but I did not see a crash-safe transaction pattern, temporary-file-plus-rename flow, or locking. That leaves obvious failure modes:
interrupted writes partial index updates two Julia processes writing to the same cache reads during index rewrite
For 1.0, I would want:
atomic index write via temp file + rename atomic payload write via temp file + rename some locking story, at least per-cache lockfile explicit recovery behavior for damaged index/data mismatch
If this is going to slow performance, opt in.
- High-latency on cache access due to write-on-read access to track "
last_access_time" (required for auto-expiration, LRU etc.). - Locking the cache for updates
- CONSIDER:
last_access_timeopt-in instead of opt-out?
You already track dates and can list/sort assets, which makes this a natural next step:
max cache size max entry count prune LRU / oldest / unlabeled dry-run pruning
Because this is file-backed and explicitly shareable across sessions and systems, users will assume some amount of multi-process safety. 1.0 should either support that or state exact limits very plainly.
DataFrame -> CSV is a good default, but for 1.0 it would be useful to support or plan for:
Arrow Parquet configurable serializer/backend by type
CSV is inspectable, but it is not always the best choice for fidelity, performance, or large tables.
Highest-value missing tests for 1.0 are:
cross-session key stability for @filecache concurrent writer behavior interrupted write / recovery behavior corrupted index handling serializer/version compatibility behavior Windows-path and filesystem edge cases large-cache performance regressions
The test environment lists Aqua and JET, but from the repo layout I did not see them integrated into CI execution. For 1.0, I would want those checks wired in if they pass cleanly.
In addition to:
- Ensuring code examples, API, discussion are up to date with the code,
- Document syntax, structure, markup errors are correct
- Language, spelling, grammatical errors are corrected
- All internal links and references are correct or correctly resolved
also:
The docs are fairly extensive, especially for usage patterns and integration, but 1.0 should add a very explicit “contract” section:
what is guaranteed stable what is best-effort what is not portable what happens across Julia/package upgrades what happens under concurrent access
For a dict-like API, these would make the package feel more complete.
Since you already have CacheAssets.ls, natural QoL additions are:
rm(pattern=...) relabel by predicate bulk invalidate / bulk move bulk export selected entries
Import is present, but end-user ergonomics would improve with:
export cache to zip export selected assets only manifest file with metadata summary