Topic Hub
Evaluation, Benchmarks, and Verifier Loops
A curated evaluation reading path: verifier-first harnesses, ablation structure, benchmark receipts, and the evidence rules that keep comparisons from collapsing into anecdotes.
This hub starts with the evaluation contract itself, then moves into the ablation and comparison layer, and finishes with the benchmark receipts and profiler evidence that connect model claims to real runs.
evaluation
verifier
benchmarks
ablation
receipts
profiling
Curated set
12
Articles in reading order
Best if you want to separate trustworthy evidence from vague score reporting across the MegaCpp archive.