From 24118dce93fbe53b61126b57ab774c9edec7aab9 Mon Sep 17 00:00:00 2001
From: lukas-heiligenbrunner <lukas.heiligenbrunner@gmail.com>
Date: Tue, 31 Dec 2024 12:23:53 +0100
Subject: [PATCH] add stuff for CAML

---
 implementation.typ     | 36 ++++++++++++++++++++++--------------
 introduction.typ       |  4 +++-
 materialandmethods.typ | 13 ++++++-------
 sources.bib            | 10 ++++++++++
 4 files changed, 41 insertions(+), 22 deletions(-)
diff --git a/implementation.typ b/implementation.typ
index c3ad1d8..2e0742d 100644
--- a/implementation.typ
+++ b/implementation.typ
@@ -15,6 +15,10 @@ For all of the three methods we test the following use-cases:#todo[maybe write m
 
 Those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
 
+
+== Experiment Setup
+#todo[Setup of experiments, which classes used, nr of samples]
+
 == ResNet50
 === Approach
 The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
@@ -79,23 +83,27 @@ After creating the embeddings for the support and query set the euclidean distan
 The class with the smallest distance is chosen as the predicted class.
 
 === Results
+This method perofrmed better than expected wich such a simple method.
 
-
-== CAML
+#todo[Add images of graphs with ResNet50 stuff only]
 
 == P>M>F
+=== Approach
+=== Results
 
-== Experiment Setup
-% todo
-todo setup of experiments, which classes used, nr of samples
-kinds of experiments which lead to graphs
+== CAML
+=== Approach
+For the CAML implementation the pretrained model weights from the original paper were used.
+As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
+This feature extractor was already pretrained when used by the authors of the original paper.
+For the non-causal sequence model a transformer model was used
+It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
+This transformer was trained on a huge number of images as described in @CAML.
 
-== Jupyter
+=== Results
+The results were not as good as expeced.
+This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain.
+The model was trained on a large number of general purpose images and is not fine-tuned at all.
+It might not handle very similar images well.
 
-To get accurate performance measures the active-learning process was implemented in a Jupyter notebook first.
-This helps to choose which of the methods performs the best and which one to use in the final Dagster pipeline.
-A straight forward machine-learning pipeline was implemented with the help of Pytorch and RESNet-18.
-
-Moreover, the Dataset was manually imported with the help of a custom torch dataloader and preprocessed with random augmentations.
-After each loop iteration the Area Under the Curve (AUC) was calculated over the validation set to get a performance measure.
-All those AUC were visualized in a line plot, see section~\ref{sec:experimental-results} for the results.
+#todo[Add images of graphs with CAML stuff only]
diff --git a/introduction.typ b/introduction.typ
index 64fdaee..43fe95c 100644
--- a/introduction.typ
+++ b/introduction.typ
@@ -1,3 +1,5 @@
+#import "utils.typ": todo
+
 = Introduction
 == Motivation
 Anomaly detection has especially in the industrial and automotive field essential importance.
@@ -31,4 +33,4 @@ How does it compare to PatchCore and EfficientAD?
 // I've tried different distance measures $->$ but results are pretty much the same.
 
 == Outline
-todo
+#todo[Todo]
diff --git a/materialandmethods.typ b/materialandmethods.typ
index 130b513..99e0f7d 100644
--- a/materialandmethods.typ
+++ b/materialandmethods.typ
@@ -197,9 +197,11 @@ There are several different ResNet architectures, the most common are ResNet-18,
 For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
 
 === P$>$M$>$F
-Todo
+// https://arxiv.org/pdf/2204.07305
 
-=== CAML
+#todo[Todo]#cite(<pmfpaper>)
+
+=== CAML <CAML>
 // https://arxiv.org/pdf/2310.10971v2
 CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning.
 It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model.
@@ -237,12 +239,9 @@ Afterwards it is passed through a simple MLP network to predict the class of the
 *Large-Scale Pre-Training:*
 CAML is pre-trained on a huge number of images from ImageNet-1k, Fungi, MSCOCO, and WikiArt datasets.
 Those datasets span over different domains and help to detect any new visual concept during inference.
-Only the non-causal sequence model is trained and the image encoder and ELMES encoder are frozen.
+Only the non-causal sequence model is trained and the weights of the image encoder and ELMES encoder are kept frozen.
 ~#cite(<caml_paper>)
 
-*Theoretical Analysis:*
-#todo[Mybe not that important?]
-
 *Inference:*
 During inference, CAML processes the following:
 - Encodes the support set images and labels with the pre-trained feature and class encoders.
@@ -250,7 +249,7 @@ During inference, CAML processes the following:
 - Passes the sequence through the non-causal sequence model, enabling dynamic interaction between query and support set representations.
 - Extracts the transformed query embedding and classifies it using a Multi-Layer Perceptron (MLP).~#cite(<caml_paper>)
 
-*Results:*
+*Performance:*
 CAML achieves state-of-the-art performance in universal meta-learning across 11 few-shot classification benchmarks,
 including generic object recognition (e.g., MiniImageNet), fine-grained classification (e.g., CUB, Aircraft),
 and cross-domain tasks (e.g., Pascal+Paintings).
diff --git a/sources.bib b/sources.bib
index a14db4d..8dc1f69 100644
--- a/sources.bib
+++ b/sources.bib
@@ -127,3 +127,13 @@
     year          = {2021},
     publisher={Johannes Kepler Universität Linz}
 }
+
+@misc{pmfpaper,
+      title={Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference},
+      author={Shell Xu Hu and Da Li and Jan Stühmer and Minyoung Kim and Timothy M. Hospedales},
+      year={2022},
+      eprint={2204.07305},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2204.07305},
+}