Learning Compositional Functions with Transformers from Easy-to-Hard Data
Best AI papers explained - Un pódcast de Enoch H. Kang

Categorías:
This paper presents a theoretical analysis of how transformers can learn k-fold composition tasks, which involve combining multiple permutations. It proposes that transformers can achieve this through a hierarchical process, where each layer learns progressively more complex compositions, referred to as "hops." The document details a curriculum learning strategy (Algorithm 1) and a mixed training approach (Algorithm 2), demonstrating how transformers can learn these tasks efficiently. The analysis includes lower bounds on the learnability of these functions within the Statistical Query (SQ) framework and provides proof sketches and lemmas to support the main theorems regarding the learning guarantees of the proposed algorithms.