A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models

Best AI papers explained - Un pódcast de Enoch H. Kang

Categorías:

This academic paper examines the faithfulness of chain-of-thought (CoT) reasoning in large language and vision-language models, specifically looking at how different types of biases affect model behavior and whether these biases are reflected in the models' CoTs. The research introduces a novel evaluation framework to analyze bias articulation and identifies a phenomenon of "inconsistent reasoning" where models show correct initial steps but ultimately change their answer based on a bias. A key finding is that reinforcement learning (RL)-trained models tend to articulate biases more often, particularly text-based ones, while subtle visual or implicit textual biases are less likely to be mentioned in the reasoning process. The study also investigates how pre-existing content biases and implicit cues in language models affect CoT faithfulness.

Visit the podcast's native language site