add sgva clip to not used materials
	
		
			
	
		
	
	
		
	
		
			All checks were successful
		
		
	
	
		
			
				
	
				Build Typst document / build_typst_documents (push) Successful in 13s
				
			
		
		
	
	
				
					
				
			
		
			All checks were successful
		
		
	
	Build Typst document / build_typst_documents (push) Successful in 13s
				
			This commit is contained in:
		@@ -374,21 +374,35 @@ Its use of frozen pre-trained feature extractors is key to avoiding overfitting
 | 
				
			|||||||
== Alternative Methods
 | 
					== Alternative Methods
 | 
				
			||||||
 | 
					
 | 
				
			||||||
There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
 | 
					There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
 | 
				
			||||||
Either they performed worse on benchmarks compared to the used methods or they were released after my literature research.
 | 
					Either they performed worse on benchmarks compared to the used methods or they were released after my initial literature research.
 | 
				
			||||||
#todo[Do it!]
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
=== SgVA-CLIP
 | 
					=== SgVA-CLIP (Semantic-guided Visual Adapting CLIP)
 | 
				
			||||||
// https://arxiv.org/pdf/2211.16191v2
 | 
					// https://arxiv.org/pdf/2211.16191v2
 | 
				
			||||||
// https://arxiv.org/abs/2211.16191v2
 | 
					// https://arxiv.org/abs/2211.16191v2
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					SgVA-CLIP (Semantic-guided Visual Adapting CLIP) is a framework that improves few-shot learning by adapting pre-trained vision-language models like CLIP.
 | 
				
			||||||
 | 
					It focuses on generating better visual features for specific tasks while still using the general knowledge from the pre-trained model.
 | 
				
			||||||
 | 
					Instead of only aligning images and text, SgVA-CLIP includes a special visual adapting layer that makes the visual features more discriminative for the given task.
 | 
				
			||||||
 | 
					This process is supported by knowledge distillation, where detailed information from the pre-trained model guides the learning of the new visual features.
 | 
				
			||||||
 | 
					Additionally, the model uses contrastive losses to further refine both the visual and textual representations.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					One advantage of SgVA-CLIP is that it can work well with very few labeled samples, making it suitable for applications like anomaly detection.
 | 
				
			||||||
 | 
					The use of pre-trained knowledge helps reduce the need for large datasets.
 | 
				
			||||||
 | 
					However, a disadvantage is that it depends heavily on the quality and capabilities of the pre-trained model.
 | 
				
			||||||
 | 
					If the pre-trained model lacks relevant information for the task, SgVA-CLIP might struggle to adapt.
 | 
				
			||||||
 | 
					This might be a no-go for anomaly detection tasks because the images in such tasks are often very task-specific and not covered by general pre-trained models.
 | 
				
			||||||
 | 
					Also, fine-tuning the model can require considerable computational resources, which might be a limitation in some cases.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
=== TRIDENT
 | 
					=== TRIDENT
 | 
				
			||||||
// https://arxiv.org/pdf/2208.10559v1
 | 
					// https://arxiv.org/pdf/2208.10559v1
 | 
				
			||||||
// https://arxiv.org/abs/2208.10559v1
 | 
					// https://arxiv.org/abs/2208.10559v1
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					== SOT
 | 
				
			||||||
// https://arxiv.org/pdf/2204.03065v1
 | 
					// https://arxiv.org/pdf/2204.03065v1
 | 
				
			||||||
// https://arxiv.org/abs/2204.03065v1
 | 
					// https://arxiv.org/abs/2204.03065v1
 | 
				
			||||||
 | 
					
 | 
				
			||||||
// anomaly detect
 | 
					// anomaly detect
 | 
				
			||||||
 | 
					== GLASS
 | 
				
			||||||
// https://arxiv.org/pdf/2407.09359v1
 | 
					// https://arxiv.org/pdf/2407.09359v1
 | 
				
			||||||
// https://arxiv.org/abs/2407.09359v1
 | 
					// https://arxiv.org/abs/2407.09359v1
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 
 | 
				
			|||||||
							
								
								
									
										10
									
								
								sources.bib
									
									
									
									
									
								
							
							
						
						
									
										10
									
								
								sources.bib
									
									
									
									
									
								
							@@ -137,3 +137,13 @@
 | 
				
			|||||||
      primaryClass={cs.CV},
 | 
					      primaryClass={cs.CV},
 | 
				
			||||||
      url={https://arxiv.org/abs/2204.07305},
 | 
					      url={https://arxiv.org/abs/2204.07305},
 | 
				
			||||||
}
 | 
					}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					@misc{peng2023sgvaclipsemanticguidedvisualadapting,
 | 
				
			||||||
 | 
					      title={SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification},
 | 
				
			||||||
 | 
					      author={Fang Peng and Xiaoshan Yang and Linhui Xiao and Yaowei Wang and Changsheng Xu},
 | 
				
			||||||
 | 
					      year={2023},
 | 
				
			||||||
 | 
					      eprint={2211.16191},
 | 
				
			||||||
 | 
					      archivePrefix={arXiv},
 | 
				
			||||||
 | 
					      primaryClass={cs.CV},
 | 
				
			||||||
 | 
					      url={https://arxiv.org/abs/2211.16191},
 | 
				
			||||||
 | 
					}
 | 
				
			||||||
 
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user