add more stuff to caml

2024-12-30 10:32:03 +01:00
parent 1805bc2d78
commit 9cd678aa70
1 changed files with 11 additions and 4 deletions
--- a/materialandmethods.typ
+++ b/materialandmethods.typ
@@ -203,10 +203,10 @@ Todo
 CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning.
 It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model.
-*Architecture:* CAML first encodes the query and support set images using the fronzen pre-trained feature extractor as shown in @camlarchitecture.
+*Architecture:* CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture.
 This step brings the images into a low dimensional space where similar images are encoded into similar embeddings.
 The class labels are encoded with the ELMES class encoder.
-Since the class of the query image is unknown in this stage we add a special learnable "unknown token" to the encoder.
+Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder.
 This embedding is learned during pre-training.
 Afterwards each image embedding is concatenated with the corresponding class embedding.
@@ -216,9 +216,16 @@ Afterwards each image embedding is concatenated with the corresponding class emb
 The encoder is a bijective mapping between the labels and set of vectors that are equal length and maximally equiangular.
 #todo[Describe what equiangular and bijective means]
 Similar to one-hot encoding but with some advantages.
 This encoder maximizes the algorithms ability to distinguish between different classes.
 *Non-causal sequence model:*
-#todo[Desc. what this is]
+The sequence created by the ELMES encoder is then fed into a non-causal sequence model.
 This might be for instance a transormer encoder.
 This step conditions the input sequence consisting of the query and support set embeddings.
 Visual features from query and support set can be compared to each other to determine specific informations such as content or textures.
 This can then be used to predict the class of the query image.
 From the output of the sequence model the element at the same position as the query is selected.
 Afterwards it is passed through a simple MLP network to predict the class of the query image.
 *Large-Scale Pre-Training:*
 #todo[Desc. what this is]
@@ -229,7 +236,7 @@ Similar to one-hot encoding but with some advantages.
 *Results:*
 #figure(
-  image("rsc/caml_architecture.png", width: 80%),
+  image("rsc/caml_architecture.png", width: 100%),
  caption: [Architecture of CAML. #cite(<caml_paper>)],
 ) <camlarchitecture>