In-Context Learning as Semantic Clustering plus Fuzzy Copy
Abstract: We consider in-context learning (ICL) as essentially a two-staged process which features a semantic clustering in earlier layers of Large Language Models (LLMs) based on the semantic properties of the in-context demonstrations with different labels, and subsequently a fuzzy copy stage starting from the intermediate and later layers of the models where they develop an increasingly accurate semantic grasp of the in context demonstrations, as well as the relationship between the final query and these demonstrations. We provide evidence that these two phases of ICL can be respectively attributed to different components of the model, and illustrate how the mechanisms which govern the functions of these components could explain various phenomena found in previous studies as concerning the patterns in the layer-wise representations of input sequences found by models under the icl setting.