bachelor-thesis/implementation.typ

#import "@preview/fletcher:0.5.3" as fletcher: diagram, node, edge
#import fletcher.shapes: rect, diamond
#import "utils.typ": todo

= Implementation
The three methods described (ResNet50, CAML, P>M>F) were implemented in a Jupyter notebook and compared to each other.

== Experiments
For all of the three methods we test the following use-cases:#todo[maybe write more to each test]
- Detection of anomaly class (1,3,5 shots)
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)
- 2 Way classification (1,3,5 shots)
- Inbalanced 2 Way classification (5,10,15,30 good shots, 5 bad shots)
- Detect only anomaly classes (1,3,5 shots)

Those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.


== Experiment Setup
#todo[Setup of experiments, which classes used, nr of samples]

== ResNet50
=== Approach
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
From both the support and query set the features are extracted to get a downprojected representation of the images.
The support set embeddings are compared to the query set embeddings.
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning.

In this bachelor thesis a pre-trained ResNet50 (IMAGENET1K_V2) pytorch model was used.
It is pretrained on the imagenet dataset and has 50 residual layers.

To get the embeddings the last layer of the model was removed and the output of the second last layer was used as embedding output.
In the following diagram the ResNet50 architecture is visualized and the cut-point is marked.

#diagram(
  spacing: (5mm, 5mm),
  node-stroke: 1pt,
  node-fill: eastern,
  edge-stroke: 1pt,

  // Input
  node((1, 1), "Input", shape: rect, width: 30mm, height: 10mm, name: <input>),

  // Conv1
  node((1, 0), "Conv1\n7x7, 64", shape: rect, width: 30mm, height: 15mm, name: <conv1>),
  edge(<input>, <conv1>, "->"),

  // MaxPool
  node((1, -1), "MaxPool\n3x3", shape: rect, width: 30mm, height: 15mm, name: <maxpool>),
  edge(<conv1>, <maxpool>, "->"),

  // Residual Blocks
  node((3, -1), "Residual Block 1\n3x [64, 64, 256]", shape: rect, width: 40mm, height: 15mm, name: <res1>),
  edge(<maxpool>, <res1>, "->"),

  node((3, 0), "Residual Block 2\n4x [128, 128, 512]", shape: rect, width: 40mm, height: 15mm, name: <res2>),
  edge(<res1>, <res2>, "->"),

  node((3, 1), "Residual Block 3\n6x [256, 256, 1024]", shape: rect, width: 40mm, height: 15mm, name: <res3>),
  edge(<res2>, <res3>, "->"),

  node((3, 2), "Residual Block 4\n3x [512, 512, 2048]", shape: rect, width: 40mm, height: 15mm, name: <res4>),
  edge(<res3>, <res4>, "->"),

  // Cutting Line
  edge(<res4>, <avgpool>, marks: "..|..>", stroke: 1pt, label: "Cut here", label-pos: 0.5, label-side: left),

  // AvgPool + FC
  node((7, 2), "AvgPool\n1x1", shape: rect, width: 30mm, height: 10mm, name: <avgpool>),
  //edge(<res4>, <avgpool>, "->"),

  node((7, 1), "Fully Connected\n1000 classes", shape: rect, width: 40mm, height: 10mm, name: <fc>),
  edge(<avgpool>, <fc>, "->"),

  // Output
  node((7, 0), "Output", shape: rect, width: 30mm, height: 10mm, name: <output>),
  edge(<fc>, <output>, "->")
)

After creating the embeddings for the support and query set the euclidean distance is calculated.
The class with the smallest distance is chosen as the predicted class.

=== Results
This method perofrmed better than expected wich such a simple method.

#todo[Add images of graphs with ResNet50 stuff only]

== P>M>F
=== Approach
=== Results

== CAML
=== Approach
For the CAML implementation the pretrained model weights from the original paper were used.
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
This feature extractor was already pretrained when used by the authors of the original paper.
For the non-causal sequence model a transformer model was used
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
This transformer was trained on a huge number of images as described in @CAML.

=== Results
The results were not as good as expeced.
This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain.
The model was trained on a large number of general purpose images and is not fine-tuned at all.
It might not handle very similar images well.

#todo[Add images of graphs with CAML stuff only]
correct eq numbering, add impl of resent50 2024-12-30 18:34:43 +01:00			`#import "@preview/fletcher:0.5.3" as fletcher: diagram, node, edge`
			`#import fletcher.shapes: rect, diamond`
			`#import "utils.typ": todo`

add remaining headings and github action workflow 2024-10-28 16:02:53 +01:00			`= Implementation`
correct eq numbering, add impl of resent50 2024-12-30 18:34:43 +01:00			`The three methods described (ResNet50, CAML, P>M>F) were implemented in a Jupyter notebook and compared to each other.`

			`== Experiments`
			`For all of the three methods we test the following use-cases:#todo[maybe write more to each test]`
			`- Detection of anomaly class (1,3,5 shots)`
			`- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)`
			`- 2 Way classification (1,3,5 shots)`
			`- Inbalanced 2 Way classification (5,10,15,30 good shots, 5 bad shots)`
			`- Detect only anomaly classes (1,3,5 shots)`

			`Those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.`

add stuff for CAML 2024-12-31 12:23:53 +01:00
			`== Experiment Setup`
			`#todo[Setup of experiments, which classes used, nr of samples]`

correct eq numbering, add impl of resent50 2024-12-30 18:34:43 +01:00			`== ResNet50`
			`=== Approach`
			`The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.`
			`From both the support and query set the features are extracted to get a downprojected representation of the images.`
			`The support set embeddings are compared to the query set embeddings.`
			`To predict the class of a query the class with the smallest distance to the support embedding is chosen.`
			`If there are more than one support embedding within the same class the mean of those embeddings is used (class center).`
			`This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning.`

			`In this bachelor thesis a pre-trained ResNet50 (IMAGENET1K_V2) pytorch model was used.`
			`It is pretrained on the imagenet dataset and has 50 residual layers.`

			`To get the embeddings the last layer of the model was removed and the output of the second last layer was used as embedding output.`
			`In the following diagram the ResNet50 architecture is visualized and the cut-point is marked.`

			`#diagram(`
			`spacing: (5mm, 5mm),`
			`node-stroke: 1pt,`
			`node-fill: eastern,`
			`edge-stroke: 1pt,`

			`// Input`
			`node((1, 1), "Input", shape: rect, width: 30mm, height: 10mm, name: <input>),`

			`// Conv1`
			`node((1, 0), "Conv1\n7x7, 64", shape: rect, width: 30mm, height: 15mm, name: <conv1>),`
			`edge(<input>, <conv1>, "->"),`

			`// MaxPool`
			`node((1, -1), "MaxPool\n3x3", shape: rect, width: 30mm, height: 15mm, name: <maxpool>),`
			`edge(<conv1>, <maxpool>, "->"),`

			`// Residual Blocks`
			`node((3, -1), "Residual Block 1\n3x [64, 64, 256]", shape: rect, width: 40mm, height: 15mm, name: <res1>),`
			`edge(<maxpool>, <res1>, "->"),`

			`node((3, 0), "Residual Block 2\n4x [128, 128, 512]", shape: rect, width: 40mm, height: 15mm, name: <res2>),`
			`edge(<res1>, <res2>, "->"),`

			`node((3, 1), "Residual Block 3\n6x [256, 256, 1024]", shape: rect, width: 40mm, height: 15mm, name: <res3>),`
			`edge(<res2>, <res3>, "->"),`

			`node((3, 2), "Residual Block 4\n3x [512, 512, 2048]", shape: rect, width: 40mm, height: 15mm, name: <res4>),`
			`edge(<res3>, <res4>, "->"),`

			`// Cutting Line`
			`edge(<res4>, <avgpool>, marks: "..\|..>", stroke: 1pt, label: "Cut here", label-pos: 0.5, label-side: left),`

			`// AvgPool + FC`
			`node((7, 2), "AvgPool\n1x1", shape: rect, width: 30mm, height: 10mm, name: <avgpool>),`
			`//edge(<res4>, <avgpool>, "->"),`

			`node((7, 1), "Fully Connected\n1000 classes", shape: rect, width: 40mm, height: 10mm, name: <fc>),`
			`edge(<avgpool>, <fc>, "->"),`

			`// Output`
			`node((7, 0), "Output", shape: rect, width: 30mm, height: 10mm, name: <output>),`
			`edge(<fc>, <output>, "->")`
			`)`

			`After creating the embeddings for the support and query set the euclidean distance is calculated.`
			`The class with the smallest distance is chosen as the predicted class.`

			`=== Results`
add stuff for CAML 2024-12-31 12:23:53 +01:00			`This method perofrmed better than expected wich such a simple method.`
correct eq numbering, add impl of resent50 2024-12-30 18:34:43 +01:00
add stuff for CAML 2024-12-31 12:23:53 +01:00			`#todo[Add images of graphs with ResNet50 stuff only]`
correct eq numbering, add impl of resent50 2024-12-30 18:34:43 +01:00
			`== P>M>F`
add stuff for CAML 2024-12-31 12:23:53 +01:00			`=== Approach`
			`=== Results`
add remaining headings and github action workflow 2024-10-28 16:02:53 +01:00
add stuff for CAML 2024-12-31 12:23:53 +01:00			`== CAML`
			`=== Approach`
			`For the CAML implementation the pretrained model weights from the original paper were used.`
			`As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.`
			`This feature extractor was already pretrained when used by the authors of the original paper.`
			`For the non-causal sequence model a transformer model was used`
			`It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.`
			`This transformer was trained on a huge number of images as described in @CAML.`
add remaining headings and github action workflow 2024-10-28 16:02:53 +01:00
add stuff for CAML 2024-12-31 12:23:53 +01:00			`=== Results`
			`The results were not as good as expeced.`
			`This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain.`
			`The model was trained on a large number of general purpose images and is not fine-tuned at all.`
			`It might not handle very similar images well.`
add remaining headings and github action workflow 2024-10-28 16:02:53 +01:00
add stuff for CAML 2024-12-31 12:23:53 +01:00			`#todo[Add images of graphs with CAML stuff only]`