From 9cd678aa70c90ac727e074f8f79d9d246f34a38a Mon Sep 17 00:00:00 2001 From: lukas-heiligenbrunner Date: Mon, 30 Dec 2024 10:32:03 +0100 Subject: [PATCH] add more stuff to caml --- materialandmethods.typ | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/materialandmethods.typ b/materialandmethods.typ index 62f1a54..80482f2 100644 --- a/materialandmethods.typ +++ b/materialandmethods.typ @@ -203,10 +203,10 @@ Todo CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning. It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model. -*Architecture:* CAML first encodes the query and support set images using the fronzen pre-trained feature extractor as shown in @camlarchitecture. +*Architecture:* CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture. This step brings the images into a low dimensional space where similar images are encoded into similar embeddings. The class labels are encoded with the ELMES class encoder. -Since the class of the query image is unknown in this stage we add a special learnable "unknown token" to the encoder. +Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder. This embedding is learned during pre-training. Afterwards each image embedding is concatenated with the corresponding class embedding. @@ -216,9 +216,16 @@ Afterwards each image embedding is concatenated with the corresponding class emb The encoder is a bijective mapping between the labels and set of vectors that are equal length and maximally equiangular. #todo[Describe what equiangular and bijective means] Similar to one-hot encoding but with some advantages. +This encoder maximizes the algorithms ability to distinguish between different classes. *Non-causal sequence model:* -#todo[Desc. what this is] +The sequence created by the ELMES encoder is then fed into a non-causal sequence model. +This might be for instance a transormer encoder. +This step conditions the input sequence consisting of the query and support set embeddings. +Visual features from query and support set can be compared to each other to determine specific informations such as content or textures. +This can then be used to predict the class of the query image. +From the output of the sequence model the element at the same position as the query is selected. +Afterwards it is passed through a simple MLP network to predict the class of the query image. *Large-Scale Pre-Training:* #todo[Desc. what this is] @@ -229,7 +236,7 @@ Similar to one-hot encoding but with some advantages. *Results:* #figure( - image("rsc/caml_architecture.png", width: 80%), + image("rsc/caml_architecture.png", width: 100%), caption: [Architecture of CAML. #cite()], )