add stuff for CAML
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 15s

This commit is contained in:
lukas-heiligenbrunner 2024-12-31 12:23:53 +01:00
parent 155faa6e80
commit 24118dce93
4 changed files with 41 additions and 22 deletions

View File

@ -15,6 +15,10 @@ For all of the three methods we test the following use-cases:#todo[maybe write m
Those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes. Those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
== Experiment Setup
#todo[Setup of experiments, which classes used, nr of samples]
== ResNet50 == ResNet50
=== Approach === Approach
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor. The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
@ -79,23 +83,27 @@ After creating the embeddings for the support and query set the euclidean distan
The class with the smallest distance is chosen as the predicted class. The class with the smallest distance is chosen as the predicted class.
=== Results === Results
This method perofrmed better than expected wich such a simple method.
#todo[Add images of graphs with ResNet50 stuff only]
== CAML
== P>M>F == P>M>F
=== Approach
=== Results
== Experiment Setup == CAML
% todo === Approach
todo setup of experiments, which classes used, nr of samples For the CAML implementation the pretrained model weights from the original paper were used.
kinds of experiments which lead to graphs As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
This feature extractor was already pretrained when used by the authors of the original paper.
For the non-causal sequence model a transformer model was used
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
This transformer was trained on a huge number of images as described in @CAML.
== Jupyter === Results
The results were not as good as expeced.
This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain.
The model was trained on a large number of general purpose images and is not fine-tuned at all.
It might not handle very similar images well.
To get accurate performance measures the active-learning process was implemented in a Jupyter notebook first. #todo[Add images of graphs with CAML stuff only]
This helps to choose which of the methods performs the best and which one to use in the final Dagster pipeline.
A straight forward machine-learning pipeline was implemented with the help of Pytorch and RESNet-18.
Moreover, the Dataset was manually imported with the help of a custom torch dataloader and preprocessed with random augmentations.
After each loop iteration the Area Under the Curve (AUC) was calculated over the validation set to get a performance measure.
All those AUC were visualized in a line plot, see section~\ref{sec:experimental-results} for the results.

View File

@ -1,3 +1,5 @@
#import "utils.typ": todo
= Introduction = Introduction
== Motivation == Motivation
Anomaly detection has especially in the industrial and automotive field essential importance. Anomaly detection has especially in the industrial and automotive field essential importance.
@ -31,4 +33,4 @@ How does it compare to PatchCore and EfficientAD?
// I've tried different distance measures $->$ but results are pretty much the same. // I've tried different distance measures $->$ but results are pretty much the same.
== Outline == Outline
todo #todo[Todo]

View File

@ -197,9 +197,11 @@ There are several different ResNet architectures, the most common are ResNet-18,
For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods. For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
=== P$>$M$>$F === P$>$M$>$F
Todo // https://arxiv.org/pdf/2204.07305
=== CAML #todo[Todo]#cite(<pmfpaper>)
=== CAML <CAML>
// https://arxiv.org/pdf/2310.10971v2 // https://arxiv.org/pdf/2310.10971v2
CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning. CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning.
It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model. It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model.
@ -237,12 +239,9 @@ Afterwards it is passed through a simple MLP network to predict the class of the
*Large-Scale Pre-Training:* *Large-Scale Pre-Training:*
CAML is pre-trained on a huge number of images from ImageNet-1k, Fungi, MSCOCO, and WikiArt datasets. CAML is pre-trained on a huge number of images from ImageNet-1k, Fungi, MSCOCO, and WikiArt datasets.
Those datasets span over different domains and help to detect any new visual concept during inference. Those datasets span over different domains and help to detect any new visual concept during inference.
Only the non-causal sequence model is trained and the image encoder and ELMES encoder are frozen. Only the non-causal sequence model is trained and the weights of the image encoder and ELMES encoder are kept frozen.
~#cite(<caml_paper>) ~#cite(<caml_paper>)
*Theoretical Analysis:*
#todo[Mybe not that important?]
*Inference:* *Inference:*
During inference, CAML processes the following: During inference, CAML processes the following:
- Encodes the support set images and labels with the pre-trained feature and class encoders. - Encodes the support set images and labels with the pre-trained feature and class encoders.
@ -250,7 +249,7 @@ During inference, CAML processes the following:
- Passes the sequence through the non-causal sequence model, enabling dynamic interaction between query and support set representations. - Passes the sequence through the non-causal sequence model, enabling dynamic interaction between query and support set representations.
- Extracts the transformed query embedding and classifies it using a Multi-Layer Perceptron (MLP).~#cite(<caml_paper>) - Extracts the transformed query embedding and classifies it using a Multi-Layer Perceptron (MLP).~#cite(<caml_paper>)
*Results:* *Performance:*
CAML achieves state-of-the-art performance in universal meta-learning across 11 few-shot classification benchmarks, CAML achieves state-of-the-art performance in universal meta-learning across 11 few-shot classification benchmarks,
including generic object recognition (e.g., MiniImageNet), fine-grained classification (e.g., CUB, Aircraft), including generic object recognition (e.g., MiniImageNet), fine-grained classification (e.g., CUB, Aircraft),
and cross-domain tasks (e.g., Pascal+Paintings). and cross-domain tasks (e.g., Pascal+Paintings).

View File

@ -127,3 +127,13 @@
year = {2021}, year = {2021},
publisher={Johannes Kepler Universität Linz} publisher={Johannes Kepler Universität Linz}
} }
@misc{pmfpaper,
title={Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference},
author={Shell Xu Hu and Da Li and Jan Stühmer and Minyoung Kim and Timothy M. Hospedales},
year={2022},
eprint={2204.07305},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2204.07305},
}