add more stuff to caml

2024-12-30 10:32:03 +01:00
parent 1805bc2d78
commit 9cd678aa70
1 changed files with 11 additions and 4 deletions
--- a/materialandmethods.typ
+++ b/materialandmethods.typ
@@ -203,10 +203,10 @@ Todo
 CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning.
 It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model.

-*Architecture:* CAML first encodes the query and support set images using the fronzen pre-trained feature extractor as shown in @camlarchitecture.
+*Architecture:* CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture.
 This step brings the images into a low dimensional space where similar images are encoded into similar embeddings.
 The class labels are encoded with the ELMES class encoder.
-Since the class of the query image is unknown in this stage we add a special learnable "unknown token" to the encoder.
+Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder.
 This embedding is learned during pre-training.
 Afterwards each image embedding is concatenated with the corresponding class embedding.

@@ -216,9 +216,16 @@ Afterwards each image embedding is concatenated with the corresponding class emb
 The encoder is a bijective mapping between the labels and set of vectors that are equal length and maximally equiangular.
 #todo[Describe what equiangular and bijective means]
 Similar to one-hot encoding but with some advantages.
+This encoder maximizes the algorithms ability to distinguish between different classes.

 *Non-causal sequence model:*
-#todo[Desc. what this is]
+The sequence created by the ELMES encoder is then fed into a non-causal sequence model.
+This might be for instance a transormer encoder.
+This step conditions the input sequence consisting of the query and support set embeddings.
+Visual features from query and support set can be compared to each other to determine specific informations such as content or textures.
+This can then be used to predict the class of the query image.
+From the output of the sequence model the element at the same position as the query is selected.
+Afterwards it is passed through a simple MLP network to predict the class of the query image.

 *Large-Scale Pre-Training:*
 #todo[Desc. what this is]
@@ -229,7 +236,7 @@ Similar to one-hot encoding but with some advantages.
 *Results:*

 #figure(
-  image("rsc/caml_architecture.png", width: 80%),
+  image("rsc/caml_architecture.png", width: 100%),
  caption: [Architecture of CAML. #cite(<caml_paper>)],
 ) <camlarchitecture>