sample_stats currently has mean_log_score and se_log_score, but those are not the mixture log predictive that PrO is actually about. the log-score wgf uses a loo empirical mixture in the drift (log q_{-j}(y_i) in compile_drift_for_logscore). neither quantity is $\log \int p(y \mid \theta) d\hat{Q}(\theta)$ evaluated on the full particle cloud.
we should add something like mixture_log_predictive (name open) at each retained draw: per observation, log-sum-exp over particles of elementwise logp, then summed over obs for a scalar trace. that is the natural monitor for how predictive the empirical measure is at a given time step, and it is closer to the paper’s $P_Q$ than mean_log_score.
OR we could also expose the loo version to match what enters the drift, but those are different objects and the issue should pick one primary definition (i'd default to full mixture for output/monitoring (loo only if we explicitly want drift-aligned diagnostics)).
sample_stats currently has mean_log_score and se_log_score, but those are not the mixture log predictive that PrO is actually about. the log-score wgf uses a loo empirical mixture in the drift (log q_{-j}(y_i) in compile_drift_for_logscore). neither quantity is$\log \int p(y \mid \theta) d\hat{Q}(\theta)$ evaluated on the full particle cloud.
we should add something like mixture_log_predictive (name open) at each retained draw: per observation, log-sum-exp over particles of elementwise logp, then summed over obs for a scalar trace. that is the natural monitor for how predictive the empirical measure is at a given time step, and it is closer to the paper’s$P_Q$ than mean_log_score.
OR we could also expose the loo version to match what enters the drift, but those are different objects and the issue should pick one primary definition (i'd default to full mixture for output/monitoring (loo only if we explicitly want drift-aligned diagnostics)).