giuliomagnifico 15 hours ago

> We do find some evidence that LLMs—particularly those produced by Claude—are perhaps slightly more accurate than humans when estimating their overall performance. Similarly, LLMs may be slightly more accurate than humans when estimating item-level confidence. Interestingly, however, we find that LLMs are not consistently capable of updating their metacognitive judgments based on their experiences. We also find that, like humans, LLMs tend to be overconfident. We believe that these conclusions can help human users better understand the extent to which they should trust LLMs’ confidence judgments and hope that these results spark new interest in studying the metacognitive capacities of LLMs