Evaluating the Unseen Capabilities: How Many Theorems Do LLMs Know?
Best AI papers explained - Un pódcast de Enoch H. Kang

Categorías:
This paper examines a fundamental limitation in evaluating large language models (LLMs): current methods primarily assess only their observable outputs, neglecting a potentially vast amount of unseen knowledge embedded within them. To address this, a research paper introduces KnowSum, a statistical framework that estimates this hidden knowledge by extrapolating from the frequency of observed outputs, drawing parallels to ecological and linguistic methods for estimating unseen species or words. The paper demonstrates KnowSum's utility across knowledge estimation, information retrieval, and diversity measurement, revealing that LLMs often express only a fraction of their estimated total knowledge and that accounting for the unseen can significantly alter comparative rankings of different models. This research highlights the importance of evaluating the full internal capabilities of LLMs, not just their surface-level performance.