bachelor-thesis/materialandmethods.typ

#import "@preview/subpar:0.1.1"

= Material and Methods

== Material

=== MVTec AD
MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection.
It contains 5354 high-resolution images divided into fifteen different object and texture categories.
Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects.

#figure(
  image("rsc/mvtec/dataset_overview_large.png", width: 80%),
  caption: [Architecture convolutional neural network. #cite(<datasetsampleimg>)],
) <datasetoverview>

In this bachelor thesis only two categories are used. The categories are "Bottle" and "Cable".

The bottle category contains 3 different defect classes: 'broken_large', 'broken_small' and 'contamination'.
#subpar.grid(
  figure(image("rsc/mvtec/bottle/broken_large_example.png"), caption: [
    Broken large defect
  ]), <a>,
  figure(image("rsc/mvtec/bottle/broken_small_example.png"), caption: [
    Broken small defect
  ]), <b>,
  figure(image("rsc/mvtec/bottle/contamination_example.png"), caption: [
    Contamination defect
  ]), <c>,
  columns: (1fr, 1fr, 1fr),
  caption: [Bottle category different defect classes],
  label: <full>,
)

Whereas cable has a lot more defect classes: 'bent_wire', 'cable_swap', 'combined', 'cut_inner_insulation',
'cut_outer_insulation', 'missing_cable', 'missing_wire', 'poke_insulation'.
So many more defect classes are already an indication that a classification task might be more difficult for the cable category.

#subpar.grid(
  figure(image("rsc/mvtec/cable/bent_wire_example.png"), caption: [
    Bent wire defect
  ]), <a>,
  figure(image("rsc/mvtec/cable/cable_swap_example.png"), caption: [
    Cable swap defect
  ]), <b>,
  figure(image("rsc/mvtec/cable/combined_example.png"), caption: [
    Combined defect
  ]), <c>,
  figure(image("rsc/mvtec/cable/cut_inner_insulation_example.png"), caption: [
    Cut inner insulation
  ]), <d>,
  figure(image("rsc/mvtec/cable/cut_outer_insulation_example.png"), caption: [
    Cut outer insulation
  ]), <e>,
  figure(image("rsc/mvtec/cable/missing_cable_example.png"), caption: [
    Mising cable defect
  ]), <e>,
  figure(image("rsc/mvtec/cable/poke_insulation_example.png"), caption: [
    Poke insulation defect
  ]), <f>,
  figure(image("rsc/mvtec/cable/missing_wire_example.png"), caption: [
    Missing wire defect
  ]), <g>,
  columns: (1fr, 1fr, 1fr, 1fr),
  caption: [Cable category different defect classes],
  label: <full>,
)

== Methods

=== Few-Shot Learning
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
So the model is prone to overfitting to the few training samples.

Typically a few-shot leaning task consists of a support and query set.
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
A common way to format a few-shot leaning problem is using n-way k-shot notation.
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.

A classical example of how such a model might work is a prototypical network.
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.

#figure(
  image("rsc/prototype_fewshot_v3.png", width: 60%),
  caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
) <prototypefewshot>

The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
See //%todo link to this section
// todo proper source

=== Generalisation from few samples

An especially hard task is to generalize from such few samples.
In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
This helps the model to learn the underlying patterns and to generalize well to unseen data.
In few-shot learning the model has to generalize from just a few samples.

=== Patchcore
PatchCore is an advanced method designed for cold-start anomaly detection and localization, primarily focused on industrial image data.
It operates on the principle that an image is anomalous if any of its patches is anomalous.
The method achieves state-of-the-art performance on benchmarks like MVTec AD with high accuracy, low computational cost, and competitive inference times. #cite(<patchcorepaper>)

// todo vll ersten absatz umofrmulieren und vereinfachen
The PatchCore framework leverages a pre-trained convolutional neural network (e.g., WideResNet50) to extract mid-level features from image patches.
By focusing on intermediate layers, PatchCore balances the retention of localized information with a reduction in bias associated with high-level features pre-trained on ImageNet.
To enhance robustness to spatial variations, the method aggregates features from local neighborhoods using adaptive pooling, which increases the receptive field without sacrificing spatial resolution. #cite(<patchcorepaper>)

A crucial component of PatchCore is its memory bank, which stores patch-level features derived from the training dataset.
This memory bank represents the nominal distribution of features against which test patches are compared.
To ensure computational efficiency and scalability, PatchCore employs a coreset reduction technique to condense the memory bank by selecting the most representative patch features.
This optimization reduces both storage requirements and inference times while maintaining the integrity of the feature space. #cite(<patchcorepaper>)

During inference, PatchCore computes anomaly scores by measuring the distance between patch features from test images and their nearest neighbors in the memory bank.
If any patch exhibits a significant deviation, the corresponding image is flagged as anomalous.
For localization, the anomaly scores of individual patches are spatially aligned and upsampled to generate segmentation maps, providing pixel-level insights into the anomalous regions. #cite(<patchcorepaper>)


Patchcore reaches a 99.6% AUROC on the MVTec AD dataset when detecting anomalies.
A great advantage of this method is the coreset subsampling reducing the memory bank size significantly.
This lowers computational costs while maintaining detection accuracy. #cite(<patchcorepaper>)

#figure(
  image("rsc/patchcore_overview.png", width: 80%),
  caption: [Architecture of Patchcore. #cite(<patchcorepaper>)],
) <patchcoreoverview>

// https://arxiv.org/pdf/2106.08265
=== EfficientAD
todo stuff #cite(<efficientADpaper>)
// https://arxiv.org/pdf/2303.14535

=== Jupyter Notebook

A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
The notebook along with the editor provides a environment for fast prototyping and data analysis.
It is widely used in the data science, mathematics and machine learning community.

In the context of this bachelor thesis it was used to test and evaluate the three few-shot learning methods and to compare them. #cite(<jupyter>)

=== CNN
Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.
A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.
Convolutional layers are a set of learnable kernels (filters).
Each filter performs a convolution operation by sliding a window over every pixel of the image.
On each pixel a dot product creates a feature map.
Convolutional layers capture features like edges, textures or shapes.
Pooling layers sample down the feature maps created by the convolutional layers.
This helps reducing the computational complexity of the overall network and help with overfitting.
Common pooling layers include average- and max pooling.
Finally, after some convolution layers the feature map is flattened and passed to a network of fully connected layers to perform a classification or regression task.
@cnnarchitecture shows a typical binary classification task.
#cite(<cnnintro>)

#figure(
  image("rsc/cnn_architecture.png", width: 80%),
  caption: [Architecture convolutional neural network. #cite(<cnnarchitectureimg>)],
) <cnnarchitecture>

=== RESNet

Residual neural networks are a special type of neural network architecture.
They are especially good for deep learning and have been used in many state-of-the-art computer vision tasks.
The main idea behind ResNet is the skip connection.
The skip connection is a direct connection from one layer to another layer which is not the next layer.
This helps to avoid the vanishing gradient problem and helps with the training of very deep networks.
ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. #cite(<resnet>)

For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.


=== CAML
Todo
=== P$>$M$>$F
Todo

=== Softmax

The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.

$
sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}
$ <softmax>

The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19th century #cite(<Boltzmann>).


=== Cross Entropy Loss
Cross Entropy Loss is a well established loss function in machine learning.
Equation~\eqref{eq:crelformal}\cite{crossentropy} shows the formal general definition of the Cross Entropy Loss.
And equation~\eqref{eq:crelbinary} is the special case of the general Cross Entropy Loss for binary classification tasks.

$
H(p,q) &= -sum_(x in cal(X)) p(x) log q(x)\
H(p,q) &= -(p log(q) + (1-p) log(1-q))\
cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i))
$

Equation~$cal(L)(p,q)$~\eqref{eq:crelbinarybatch}\cite{handsonaiI} is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.

=== Mathematical modeling of problem

== Alternative Methods

There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
add mvtec example imgs 2024-11-11 14:30:21 +01:00			`#import "@preview/subpar:0.1.1"`

add typst alt impl 2024-10-28 12:43:59 +01:00			`= Material and Methods`

			`== Material`

			`=== MVTec AD`
			`MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection.`
add mvtec example imgs 2024-11-11 14:30:21 +01:00			`It contains 5354 high-resolution images divided into fifteen different object and texture categories.`
add typst alt impl 2024-10-28 12:43:59 +01:00			`Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects.`

add remaining headings and github action workflow 2024-10-28 16:02:53 +01:00			`#figure(`
add mvtec example imgs 2024-11-11 14:30:21 +01:00			`image("rsc/mvtec/dataset_overview_large.png", width: 80%),`
add remaining headings and github action workflow 2024-10-28 16:02:53 +01:00			`caption: [Architecture convolutional neural network. #cite(<datasetsampleimg>)],`
			`) <datasetoverview>`
add typst alt impl 2024-10-28 12:43:59 +01:00
add mvtec example imgs 2024-11-11 14:30:21 +01:00			`In this bachelor thesis only two categories are used. The categories are "Bottle" and "Cable".`

improve intro 2024-11-29 16:18:04 +01:00			`The bottle category contains 3 different defect classes: 'broken_large', 'broken_small' and 'contamination'.`
add mvtec example imgs 2024-11-11 14:30:21 +01:00			`#subpar.grid(`
			`figure(image("rsc/mvtec/bottle/broken_large_example.png"), caption: [`
			`Broken large defect`
			`]), <a>,`
			`figure(image("rsc/mvtec/bottle/broken_small_example.png"), caption: [`
			`Broken small defect`
			`]), <b>,`
			`figure(image("rsc/mvtec/bottle/contamination_example.png"), caption: [`
			`Contamination defect`
			`]), <c>,`
			`columns: (1fr, 1fr, 1fr),`
			`caption: [Bottle category different defect classes],`
			`label: <full>,`
			`)`

			`Whereas cable has a lot more defect classes: 'bent_wire', 'cable_swap', 'combined', 'cut_inner_insulation',`
			`'cut_outer_insulation', 'missing_cable', 'missing_wire', 'poke_insulation'.`
improve intro 2024-11-29 16:18:04 +01:00			`So many more defect classes are already an indication that a classification task might be more difficult for the cable category.`
add mvtec example imgs 2024-11-11 14:30:21 +01:00
			`#subpar.grid(`
			`figure(image("rsc/mvtec/cable/bent_wire_example.png"), caption: [`
			`Bent wire defect`
			`]), <a>,`
			`figure(image("rsc/mvtec/cable/cable_swap_example.png"), caption: [`
			`Cable swap defect`
			`]), <b>,`
			`figure(image("rsc/mvtec/cable/combined_example.png"), caption: [`
			`Combined defect`
			`]), <c>,`
			`figure(image("rsc/mvtec/cable/cut_inner_insulation_example.png"), caption: [`
			`Cut inner insulation`
			`]), <d>,`
			`figure(image("rsc/mvtec/cable/cut_outer_insulation_example.png"), caption: [`
			`Cut outer insulation`
			`]), <e>,`
			`figure(image("rsc/mvtec/cable/missing_cable_example.png"), caption: [`
			`Mising cable defect`
			`]), <e>,`
			`figure(image("rsc/mvtec/cable/poke_insulation_example.png"), caption: [`
			`Poke insulation defect`
			`]), <f>,`
			`figure(image("rsc/mvtec/cable/missing_wire_example.png"), caption: [`
			`Missing wire defect`
			`]), <g>,`
			`columns: (1fr, 1fr, 1fr, 1fr),`
			`caption: [Cable category different defect classes],`
			`label: <full>,`
			`)`
add typst alt impl 2024-10-28 12:43:59 +01:00
			`== Methods`

			`=== Few-Shot Learning`
			`Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.`
			`In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.`
			`So the model is prone to overfitting to the few training samples.`

			`Typically a few-shot leaning task consists of a support and query set.`
			`Where the support-set contains the training data and the query set the evaluation data for real world evaluation.`
			`A common way to format a few-shot leaning problem is using n-way k-shot notation.`
			`For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.`

			`A classical example of how such a model might work is a prototypical network.`
			`These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.`

add image of prototypical network 2024-10-28 16:25:02 +01:00			`#figure(`
			`image("rsc/prototype_fewshot_v3.png", width: 60%),`
			`caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],`
			`) <prototypefewshot>`

add typst alt impl 2024-10-28 12:43:59 +01:00			`The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.`
			`See //%todo link to this section`
			`// todo proper source`

add remaining headings and github action workflow 2024-10-28 16:02:53 +01:00			`=== Generalisation from few samples`
add typst alt impl 2024-10-28 12:43:59 +01:00
add generalisation section 2024-11-01 23:22:03 +01:00			`An especially hard task is to generalize from such few samples.`
			`In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.`
			`This helps the model to learn the underlying patterns and to generalize well to unseen data.`
			`In few-shot learning the model has to generalize from just a few samples.`

add remaining headings and github action workflow 2024-10-28 16:02:53 +01:00			`=== Patchcore`
add patchcore overview 2024-12-09 16:20:48 +01:00			`PatchCore is an advanced method designed for cold-start anomaly detection and localization, primarily focused on industrial image data.`
			`It operates on the principle that an image is anomalous if any of its patches is anomalous.`
			`The method achieves state-of-the-art performance on benchmarks like MVTec AD with high accuracy, low computational cost, and competitive inference times. #cite(<patchcorepaper>)`
add typst alt impl 2024-10-28 12:43:59 +01:00
add patchcore overview 2024-12-09 16:20:48 +01:00			`// todo vll ersten absatz umofrmulieren und vereinfachen`
			`The PatchCore framework leverages a pre-trained convolutional neural network (e.g., WideResNet50) to extract mid-level features from image patches.`
			`By focusing on intermediate layers, PatchCore balances the retention of localized information with a reduction in bias associated with high-level features pre-trained on ImageNet.`
			`To enhance robustness to spatial variations, the method aggregates features from local neighborhoods using adaptive pooling, which increases the receptive field without sacrificing spatial resolution. #cite(<patchcorepaper>)`

			`A crucial component of PatchCore is its memory bank, which stores patch-level features derived from the training dataset.`
			`This memory bank represents the nominal distribution of features against which test patches are compared.`
			`To ensure computational efficiency and scalability, PatchCore employs a coreset reduction technique to condense the memory bank by selecting the most representative patch features.`
			`This optimization reduces both storage requirements and inference times while maintaining the integrity of the feature space. #cite(<patchcorepaper>)`

			`During inference, PatchCore computes anomaly scores by measuring the distance between patch features from test images and their nearest neighbors in the memory bank.`
			`If any patch exhibits a significant deviation, the corresponding image is flagged as anomalous.`
			`For localization, the anomaly scores of individual patches are spatially aligned and upsampled to generate segmentation maps, providing pixel-level insights into the anomalous regions. #cite(<patchcorepaper>)`


			`Patchcore reaches a 99.6% AUROC on the MVTec AD dataset when detecting anomalies.`
			`A great advantage of this method is the coreset subsampling reducing the memory bank size significantly.`
			`This lowers computational costs while maintaining detection accuracy. #cite(<patchcorepaper>)`

			`#figure(`
			`image("rsc/patchcore_overview.png", width: 80%),`
			`caption: [Architecture of Patchcore. #cite(<patchcorepaper>)],`
			`) <patchcoreoverview>`
add typst alt impl 2024-10-28 12:43:59 +01:00
			`// https://arxiv.org/pdf/2106.08265`
add patchcore overview 2024-12-09 16:20:48 +01:00			`=== EfficientAD`
add typst alt impl 2024-10-28 12:43:59 +01:00			`todo stuff #cite(<efficientADpaper>)`
			`// https://arxiv.org/pdf/2303.14535`

			`=== Jupyter Notebook`

			`A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.`
			`The notebook along with the editor provides a environment for fast prototyping and data analysis.`
			`It is widely used in the data science, mathematics and machine learning community.`

add image of prototypical network 2024-10-28 16:25:02 +01:00			`In the context of this bachelor thesis it was used to test and evaluate the three few-shot learning methods and to compare them. #cite(<jupyter>)`
add typst alt impl 2024-10-28 12:43:59 +01:00
			`=== CNN`
			`Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.`
			`A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.`
			`Convolutional layers are a set of learnable kernels (filters).`
			`Each filter performs a convolution operation by sliding a window over every pixel of the image.`
			`On each pixel a dot product creates a feature map.`
			`Convolutional layers capture features like edges, textures or shapes.`
			`Pooling layers sample down the feature maps created by the convolutional layers.`
			`This helps reducing the computational complexity of the overall network and help with overfitting.`
			`Common pooling layers include average- and max pooling.`
			`Finally, after some convolution layers the feature map is flattened and passed to a network of fully connected layers to perform a classification or regression task.`
			`@cnnarchitecture shows a typical binary classification task.`
			`#cite(<cnnintro>)`

			`#figure(`
			`image("rsc/cnn_architecture.png", width: 80%),`
			`caption: [Architecture convolutional neural network. #cite(<cnnarchitectureimg>)],`
			`) <cnnarchitecture>`

			`=== RESNet`

			`Residual neural networks are a special type of neural network architecture.`
			`They are especially good for deep learning and have been used in many state-of-the-art computer vision tasks.`
			`The main idea behind ResNet is the skip connection.`
			`The skip connection is a direct connection from one layer to another layer which is not the next layer.`
			`This helps to avoid the vanishing gradient problem and helps with the training of very deep networks.`
			`ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.`
			`There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. #cite(<resnet>)`

add generalisation section 2024-11-01 23:22:03 +01:00			`For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.`

add typst alt impl 2024-10-28 12:43:59 +01:00
			`=== CAML`
			`Todo`
			`=== P$>$M$>$F`
			`Todo`

			`=== Softmax`

			`The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.`
			`Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.`

			`$`
fix formulars 2024-10-28 13:22:38 +01:00			`sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}`
add typst alt impl 2024-10-28 12:43:59 +01:00			`$ <softmax>`

			`The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19th century #cite(<Boltzmann>).`


			`=== Cross Entropy Loss`
			`Cross Entropy Loss is a well established loss function in machine learning.`
			`Equation~\eqref{eq:crelformal}\cite{crossentropy} shows the formal general definition of the Cross Entropy Loss.`
			`And equation~\eqref{eq:crelbinary} is the special case of the general Cross Entropy Loss for binary classification tasks.`

			`$`
			`H(p,q) &= -sum_(x in cal(X)) p(x) log q(x)\`
fix formulars 2024-10-28 13:22:38 +01:00			`H(p,q) &= -(p log(q) + (1-p) log(1-q))\`
add typst alt impl 2024-10-28 12:43:59 +01:00			`cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i))`
			`$`

			`Equation~$cal(L)(p,q)$~\eqref{eq:crelbinarybatch}\cite{handsonaiI} is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.`

			`=== Mathematical modeling of problem`
add alt methods section 2024-11-04 12:26:00 +01:00
			`== Alternative Methods`

			`There are several alternative methods to few-shot learning which are not used in this bachelor thesis.`