From 24118dce93fbe53b61126b57ab774c9edec7aab9 Mon Sep 17 00:00:00 2001 From: lukas-heiligenbrunner Date: Tue, 31 Dec 2024 12:23:53 +0100 Subject: [PATCH] add stuff for CAML --- implementation.typ | 36 ++++++++++++++++++++++-------------- introduction.typ | 4 +++- materialandmethods.typ | 13 ++++++------- sources.bib | 10 ++++++++++ 4 files changed, 41 insertions(+), 22 deletions(-) diff --git a/implementation.typ b/implementation.typ index c3ad1d8..2e0742d 100644 --- a/implementation.typ +++ b/implementation.typ @@ -15,6 +15,10 @@ For all of the three methods we test the following use-cases:#todo[maybe write m Those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes. + +== Experiment Setup +#todo[Setup of experiments, which classes used, nr of samples] + == ResNet50 === Approach The simplest approach is to use a pre-trained ResNet50 model as a feature extractor. @@ -79,23 +83,27 @@ After creating the embeddings for the support and query set the euclidean distan The class with the smallest distance is chosen as the predicted class. === Results +This method perofrmed better than expected wich such a simple method. - -== CAML +#todo[Add images of graphs with ResNet50 stuff only] == P>M>F +=== Approach +=== Results -== Experiment Setup -% todo -todo setup of experiments, which classes used, nr of samples -kinds of experiments which lead to graphs +== CAML +=== Approach +For the CAML implementation the pretrained model weights from the original paper were used. +As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16. +This feature extractor was already pretrained when used by the authors of the original paper. +For the non-causal sequence model a transformer model was used +It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096. +This transformer was trained on a huge number of images as described in @CAML. -== Jupyter +=== Results +The results were not as good as expeced. +This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain. +The model was trained on a large number of general purpose images and is not fine-tuned at all. +It might not handle very similar images well. -To get accurate performance measures the active-learning process was implemented in a Jupyter notebook first. -This helps to choose which of the methods performs the best and which one to use in the final Dagster pipeline. -A straight forward machine-learning pipeline was implemented with the help of Pytorch and RESNet-18. - -Moreover, the Dataset was manually imported with the help of a custom torch dataloader and preprocessed with random augmentations. -After each loop iteration the Area Under the Curve (AUC) was calculated over the validation set to get a performance measure. -All those AUC were visualized in a line plot, see section~\ref{sec:experimental-results} for the results. +#todo[Add images of graphs with CAML stuff only] diff --git a/introduction.typ b/introduction.typ index 64fdaee..43fe95c 100644 --- a/introduction.typ +++ b/introduction.typ @@ -1,3 +1,5 @@ +#import "utils.typ": todo + = Introduction == Motivation Anomaly detection has especially in the industrial and automotive field essential importance. @@ -31,4 +33,4 @@ How does it compare to PatchCore and EfficientAD? // I've tried different distance measures $->$ but results are pretty much the same. == Outline -todo +#todo[Todo] diff --git a/materialandmethods.typ b/materialandmethods.typ index 130b513..99e0f7d 100644 --- a/materialandmethods.typ +++ b/materialandmethods.typ @@ -197,9 +197,11 @@ There are several different ResNet architectures, the most common are ResNet-18, For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods. === P$>$M$>$F -Todo +// https://arxiv.org/pdf/2204.07305 -=== CAML +#todo[Todo]#cite() + +=== CAML // https://arxiv.org/pdf/2310.10971v2 CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning. It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model. @@ -237,12 +239,9 @@ Afterwards it is passed through a simple MLP network to predict the class of the *Large-Scale Pre-Training:* CAML is pre-trained on a huge number of images from ImageNet-1k, Fungi, MSCOCO, and WikiArt datasets. Those datasets span over different domains and help to detect any new visual concept during inference. -Only the non-causal sequence model is trained and the image encoder and ELMES encoder are frozen. +Only the non-causal sequence model is trained and the weights of the image encoder and ELMES encoder are kept frozen. ~#cite() -*Theoretical Analysis:* -#todo[Mybe not that important?] - *Inference:* During inference, CAML processes the following: - Encodes the support set images and labels with the pre-trained feature and class encoders. @@ -250,7 +249,7 @@ During inference, CAML processes the following: - Passes the sequence through the non-causal sequence model, enabling dynamic interaction between query and support set representations. - Extracts the transformed query embedding and classifies it using a Multi-Layer Perceptron (MLP).~#cite() -*Results:* +*Performance:* CAML achieves state-of-the-art performance in universal meta-learning across 11 few-shot classification benchmarks, including generic object recognition (e.g., MiniImageNet), fine-grained classification (e.g., CUB, Aircraft), and cross-domain tasks (e.g., Pascal+Paintings). diff --git a/sources.bib b/sources.bib index a14db4d..8dc1f69 100644 --- a/sources.bib +++ b/sources.bib @@ -127,3 +127,13 @@ year = {2021}, publisher={Johannes Kepler Universität Linz} } + +@misc{pmfpaper, + title={Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference}, + author={Shell Xu Hu and Da Li and Jan Stühmer and Minyoung Kim and Timothy M. Hospedales}, + year={2022}, + eprint={2204.07305}, + archivePrefix={arXiv}, + primaryClass={cs.CV}, + url={https://arxiv.org/abs/2204.07305}, +}