kind of finish caml general infos
	
		
			
	
		
	
	
		
	
		
			All checks were successful
		
		
	
	
		
			
				
	
				Build Typst document / build_typst_documents (push) Successful in 8s
				
			
		
		
	
	
				
					
				
			
		
			All checks were successful
		
		
	
	Build Typst document / build_typst_documents (push) Successful in 8s
				
			This commit is contained in:
		| @@ -202,21 +202,26 @@ Todo | |||||||
| // https://arxiv.org/pdf/2310.10971v2 | // https://arxiv.org/pdf/2310.10971v2 | ||||||
| CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning. | CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning. | ||||||
| It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model. | It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model. | ||||||
|  | This is a universal meta-learning approach. | ||||||
|  | That means no fine-tuning or meta-training is applied for specific domains.~#cite(<caml_paper>) | ||||||
|  |  | ||||||
| *Architecture:* CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture. | *Architecture:* | ||||||
|  | CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture. | ||||||
| This step brings the images into a low dimensional space where similar images are encoded into similar embeddings. | This step brings the images into a low dimensional space where similar images are encoded into similar embeddings. | ||||||
| The class labels are encoded with the ELMES class encoder. | The class labels are encoded with the ELMES class encoder. | ||||||
| Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder. | Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder. | ||||||
| This embedding is learned during pre-training. | This embedding is learned during pre-training. | ||||||
| Afterwards each image embedding is concatenated with the corresponding class embedding. | Afterwards each image embedding is concatenated with the corresponding class embedding. | ||||||
|  | ~#cite(<caml_paper>) | ||||||
|  | #todo[Add more references to the architecture image below] | ||||||
|  |  | ||||||
| #todo[We should add stuff here why we have a max amount of shots bc. of pretrained model] | *ELMES Encoder:* | ||||||
|  | The ELMES (Equal Length and Maximally Equiangular Set) encoder encodes the class labels to vectors of equal length. | ||||||
| *ELMES Encoder:* The ELMES (Equal Length and Maximally Equiangular Set) encoder encodes the class labels to vectors of equal length. |  | ||||||
| The encoder is a bijective mapping between the labels and set of vectors that are equal length and maximally equiangular. | The encoder is a bijective mapping between the labels and set of vectors that are equal length and maximally equiangular. | ||||||
| #todo[Describe what equiangular and bijective means] | #todo[Describe what equiangular and bijective means] | ||||||
| Similar to one-hot encoding but with some advantages. | Similar to one-hot encoding but with some advantages. | ||||||
| This encoder maximizes the algorithms ability to distinguish between different classes. | This encoder maximizes the algorithms ability to distinguish between different classes. | ||||||
|  | ~#cite(<caml_paper>) | ||||||
|  |  | ||||||
| *Non-causal sequence model:* | *Non-causal sequence model:* | ||||||
| The sequence created by the ELMES encoder is then fed into a non-causal sequence model. | The sequence created by the ELMES encoder is then fed into a non-causal sequence model. | ||||||
| @@ -226,14 +231,37 @@ Visual features from query and support set can be compared to each other to dete | |||||||
| This can then be used to predict the class of the query image. | This can then be used to predict the class of the query image. | ||||||
| From the output of the sequence model the element at the same position as the query is selected. | From the output of the sequence model the element at the same position as the query is selected. | ||||||
| Afterwards it is passed through a simple MLP network to predict the class of the query image. | Afterwards it is passed through a simple MLP network to predict the class of the query image. | ||||||
|  | ~#cite(<caml_paper>) | ||||||
|  |  | ||||||
| *Large-Scale Pre-Training:* | *Large-Scale Pre-Training:* | ||||||
| #todo[Desc. what this is] | CAML is pre-trained on a huge number of images from ImageNet-1k, Fungi, MSCOCO, and WikiArt datasets. | ||||||
|  | Those datasets span over different domains and help to detect any new visual concept during inference. | ||||||
|  | Only the non-causal sequence model is trained and the image encoder and ELMES encoder are frozen. | ||||||
|  | ~#cite(<caml_paper>) | ||||||
|  |  | ||||||
| *Theoretical Analysis:* | *Theoretical Analysis:* | ||||||
| #todo[Mybe not that important?] | #todo[Mybe not that important?] | ||||||
|  |  | ||||||
|  | *Inference:* | ||||||
|  | During inference, CAML processes the following: | ||||||
|  | - Encodes the support set images and labels with the pre-trained feature and class encoders. | ||||||
|  | - Concatenates these encodings into a sequence alongside the query image embedding. | ||||||
|  | - Passes the sequence through the non-causal sequence model, enabling dynamic interaction between query and support set representations. | ||||||
|  | - Extracts the transformed query embedding and classifies it using a Multi-Layer Perceptron (MLP).~#cite(<caml_paper>) | ||||||
|  |  | ||||||
| *Results:* | *Results:* | ||||||
|  | CAML achieves state-of-the-art performance in universal meta-learning across 11 few-shot classification benchmarks, | ||||||
|  | including generic object recognition (e.g., MiniImageNet), fine-grained classification (e.g., CUB, Aircraft), | ||||||
|  | and cross-domain tasks (e.g., Pascal+Paintings). | ||||||
|  | It outperformed or matched existing models in 14 of 22 evaluation settings. | ||||||
|  | It performes competitively against P>M>F in 8 benchmarks even though P>M>F was meta-trained on the same domain. | ||||||
|  | ~#cite(<caml_paper>) | ||||||
|  |  | ||||||
|  | CAML does great in generalization and inference efficiency but faces limitations in specialized domains (e.g., ChestX) | ||||||
|  | and low-resolution tasks (e.g., CIFAR-fs). | ||||||
|  | Its use of frozen pre-trained feature extractors is key to avoiding overfitting and enabling robust performance. | ||||||
|  | ~#cite(<caml_paper>) | ||||||
|  | #todo[We should add stuff here why we have a max amount of shots bc. of pretrained model] | ||||||
|  |  | ||||||
| #figure( | #figure( | ||||||
|   image("rsc/caml_architecture.png", width: 100%), |   image("rsc/caml_architecture.png", width: 100%), | ||||||
|   | |||||||
		Reference in New Issue
	
	Block a user