In many high-stakes situations, large language models are not worth the risk. Knowing which outputs to throw out might fix that. Large language models are famous for their ability to make things up—in ...