442 Episodo

  1. Tina: Tiny LoRA Reasoning Models

    Publicado: 25/4/2025
  2. Evaluating large language models in theory of mind tasks

    Publicado: 25/4/2025
  3. QUEST: Quality Sampling for Machine Translation

    Publicado: 24/4/2025
  4. Offline Preference Learning via Simulated Trajectory Feedback

    Publicado: 24/4/2025
  5. Reasoning Elicitation in Language Models via Counterfactual Feedback

    Publicado: 24/4/2025
  6. Eliciting Human Preferences with Language Models

    Publicado: 24/4/2025
  7. Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning

    Publicado: 24/4/2025
  8. γ-Bench: Evaluating LLMs in Multi-Agent Games

    Publicado: 24/4/2025
  9. DRAFT: Self-Driven LLM Tool Mastery via Documentation Refinement

    Publicado: 24/4/2025
  10. Optimal Prediction Sets for Enhanced Human-AI Accuracy

    Publicado: 24/4/2025
  11. Self-Correction via Reinforcement Learning for Language Models

    Publicado: 24/4/2025
  12. Tractable Multi-Agent Reinforcement Learning through Behavioral Economics

    Publicado: 24/4/2025
  13. Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement

    Publicado: 24/4/2025
  14. Iterative Nash Policy Optimization for Language Model Alignment

    Publicado: 24/4/2025
  15. SycEval: Benchmarking LLM Sycophancy in Mathematics and Medicine

    Publicado: 23/4/2025
  16. Stack AI: Democratizing Enterprise AI Development

    Publicado: 22/4/2025
  17. Evaluating Modern Recommender Systems: Challenges and Future Directions

    Publicado: 22/4/2025
  18. AI in the Enterprise: Seven Lessons from Frontier Companies by OpenAI

    Publicado: 22/4/2025
  19. Discussion: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

    Publicado: 21/4/2025
  20. AI Agent Protocols and Human Preference

    Publicado: 21/4/2025

17 / 23

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.

Visit the podcast's native language site