kind of finish caml general infos
	
		
			
	
		
	
	
		
	
		
			All checks were successful
		
		
	
	
		
			
				
	
				Build Typst document / build_typst_documents (push) Successful in 8s
				
			
		
		
	
	
				
					
				
			
		
			All checks were successful
		
		
	
	Build Typst document / build_typst_documents (push) Successful in 8s
				
			This commit is contained in:
		| @@ -202,21 +202,26 @@ Todo | ||||
| // https://arxiv.org/pdf/2310.10971v2 | ||||
| CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning. | ||||
| It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model. | ||||
| This is a universal meta-learning approach. | ||||
| That means no fine-tuning or meta-training is applied for specific domains.~#cite(<caml_paper>) | ||||
|  | ||||
| *Architecture:* CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture. | ||||
| *Architecture:* | ||||
| CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture. | ||||
| This step brings the images into a low dimensional space where similar images are encoded into similar embeddings. | ||||
| The class labels are encoded with the ELMES class encoder. | ||||
| Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder. | ||||
| This embedding is learned during pre-training. | ||||
| Afterwards each image embedding is concatenated with the corresponding class embedding. | ||||
| ~#cite(<caml_paper>) | ||||
| #todo[Add more references to the architecture image below] | ||||
|  | ||||
| #todo[We should add stuff here why we have a max amount of shots bc. of pretrained model] | ||||
|  | ||||
| *ELMES Encoder:* The ELMES (Equal Length and Maximally Equiangular Set) encoder encodes the class labels to vectors of equal length. | ||||
| *ELMES Encoder:* | ||||
| The ELMES (Equal Length and Maximally Equiangular Set) encoder encodes the class labels to vectors of equal length. | ||||
| The encoder is a bijective mapping between the labels and set of vectors that are equal length and maximally equiangular. | ||||
| #todo[Describe what equiangular and bijective means] | ||||
| Similar to one-hot encoding but with some advantages. | ||||
| This encoder maximizes the algorithms ability to distinguish between different classes. | ||||
| ~#cite(<caml_paper>) | ||||
|  | ||||
| *Non-causal sequence model:* | ||||
| The sequence created by the ELMES encoder is then fed into a non-causal sequence model. | ||||
| @@ -226,14 +231,37 @@ Visual features from query and support set can be compared to each other to dete | ||||
| This can then be used to predict the class of the query image. | ||||
| From the output of the sequence model the element at the same position as the query is selected. | ||||
| Afterwards it is passed through a simple MLP network to predict the class of the query image. | ||||
| ~#cite(<caml_paper>) | ||||
|  | ||||
| *Large-Scale Pre-Training:* | ||||
| #todo[Desc. what this is] | ||||
| CAML is pre-trained on a huge number of images from ImageNet-1k, Fungi, MSCOCO, and WikiArt datasets. | ||||
| Those datasets span over different domains and help to detect any new visual concept during inference. | ||||
| Only the non-causal sequence model is trained and the image encoder and ELMES encoder are frozen. | ||||
| ~#cite(<caml_paper>) | ||||
|  | ||||
| *Theoretical Analysis:* | ||||
| #todo[Mybe not that important?] | ||||
|  | ||||
| *Inference:* | ||||
| During inference, CAML processes the following: | ||||
| - Encodes the support set images and labels with the pre-trained feature and class encoders. | ||||
| - Concatenates these encodings into a sequence alongside the query image embedding. | ||||
| - Passes the sequence through the non-causal sequence model, enabling dynamic interaction between query and support set representations. | ||||
| - Extracts the transformed query embedding and classifies it using a Multi-Layer Perceptron (MLP).~#cite(<caml_paper>) | ||||
|  | ||||
| *Results:* | ||||
| CAML achieves state-of-the-art performance in universal meta-learning across 11 few-shot classification benchmarks, | ||||
| including generic object recognition (e.g., MiniImageNet), fine-grained classification (e.g., CUB, Aircraft), | ||||
| and cross-domain tasks (e.g., Pascal+Paintings). | ||||
| It outperformed or matched existing models in 14 of 22 evaluation settings. | ||||
| It performes competitively against P>M>F in 8 benchmarks even though P>M>F was meta-trained on the same domain. | ||||
| ~#cite(<caml_paper>) | ||||
|  | ||||
| CAML does great in generalization and inference efficiency but faces limitations in specialized domains (e.g., ChestX) | ||||
| and low-resolution tasks (e.g., CIFAR-fs). | ||||
| Its use of frozen pre-trained feature extractors is key to avoiding overfitting and enabling robust performance. | ||||
| ~#cite(<caml_paper>) | ||||
| #todo[We should add stuff here why we have a max amount of shots bc. of pretrained model] | ||||
|  | ||||
| #figure( | ||||
|   image("rsc/caml_architecture.png", width: 100%), | ||||
|   | ||||
		Reference in New Issue
	
	Block a user