A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Best AI papers explained - Un pódcast de Enoch H. Kang

Categorías:
This academic paper proposes a local data attribution framework for online reinforcement learning (RL). The framework uses influence functions to identify which training data records negatively impact the RL agent's learning within each training round. By filtering out these harmful records, the proposed method, called Influence-guided Intervention and Filtering (IIF), demonstrates improved performance and sample efficiency in standard RL tasks and also shows promise in reducing toxicity in Reinforcement Learning from Human Feedback (RLHF) for large language models. The paper analyzes the characteristics of influential records and the impact of different filtering levels on learning.