lukas-heilgenbrunner
a358401ffb
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 11s
245 lines
13 KiB
Plaintext
245 lines
13 KiB
Plaintext
#import "@preview/subpar:0.1.1"
|
|
#import "utils.typ": todo
|
|
|
|
= Material and Methods
|
|
|
|
== Material
|
|
|
|
=== MVTec AD
|
|
MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection.
|
|
It contains 5354 high-resolution images divided into fifteen different object and texture categories.
|
|
Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects.
|
|
|
|
#figure(
|
|
image("rsc/mvtec/dataset_overview_large.png", width: 80%),
|
|
caption: [Architecture convolutional neural network. #cite(<datasetsampleimg>)],
|
|
) <datasetoverview>
|
|
|
|
In this bachelor thesis only two categories are used. The categories are "Bottle" and "Cable".
|
|
|
|
The bottle category contains 3 different defect classes: 'broken_large', 'broken_small' and 'contamination'.
|
|
#subpar.grid(
|
|
figure(image("rsc/mvtec/bottle/broken_large_example.png"), caption: [
|
|
Broken large defect
|
|
]), <a>,
|
|
figure(image("rsc/mvtec/bottle/broken_small_example.png"), caption: [
|
|
Broken small defect
|
|
]), <b>,
|
|
figure(image("rsc/mvtec/bottle/contamination_example.png"), caption: [
|
|
Contamination defect
|
|
]), <c>,
|
|
columns: (1fr, 1fr, 1fr),
|
|
caption: [Bottle category different defect classes],
|
|
label: <full>,
|
|
)
|
|
|
|
Whereas cable has a lot more defect classes: 'bent_wire', 'cable_swap', 'combined', 'cut_inner_insulation',
|
|
'cut_outer_insulation', 'missing_cable', 'missing_wire', 'poke_insulation'.
|
|
So many more defect classes are already an indication that a classification task might be more difficult for the cable category.
|
|
|
|
#subpar.grid(
|
|
figure(image("rsc/mvtec/cable/bent_wire_example.png"), caption: [
|
|
Bent wire defect
|
|
]), <a>,
|
|
figure(image("rsc/mvtec/cable/cable_swap_example.png"), caption: [
|
|
Cable swap defect
|
|
]), <b>,
|
|
figure(image("rsc/mvtec/cable/combined_example.png"), caption: [
|
|
Combined defect
|
|
]), <c>,
|
|
figure(image("rsc/mvtec/cable/cut_inner_insulation_example.png"), caption: [
|
|
Cut inner insulation
|
|
]), <d>,
|
|
figure(image("rsc/mvtec/cable/cut_outer_insulation_example.png"), caption: [
|
|
Cut outer insulation
|
|
]), <e>,
|
|
figure(image("rsc/mvtec/cable/missing_cable_example.png"), caption: [
|
|
Mising cable defect
|
|
]), <e>,
|
|
figure(image("rsc/mvtec/cable/poke_insulation_example.png"), caption: [
|
|
Poke insulation defect
|
|
]), <f>,
|
|
figure(image("rsc/mvtec/cable/missing_wire_example.png"), caption: [
|
|
Missing wire defect
|
|
]), <g>,
|
|
columns: (1fr, 1fr, 1fr, 1fr),
|
|
caption: [Cable category different defect classes],
|
|
label: <full>,
|
|
)
|
|
|
|
== Methods
|
|
|
|
=== Few-Shot Learning
|
|
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
|
|
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
|
|
So the model is prone to overfitting to the few training samples.
|
|
|
|
Typically a few-shot leaning task consists of a support and query set.
|
|
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
|
|
A common way to format a few-shot leaning problem is using n-way k-shot notation.
|
|
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.
|
|
|
|
A classical example of how such a model might work is a prototypical network.
|
|
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.
|
|
|
|
#figure(
|
|
image("rsc/prototype_fewshot_v3.png", width: 60%),
|
|
caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
|
|
) <prototypefewshot>
|
|
|
|
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
|
|
See #todo[link to this section]
|
|
#todo[proper source]
|
|
|
|
=== Generalisation from few samples
|
|
|
|
An especially hard task is to generalize from such few samples.
|
|
In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
|
|
This helps the model to learn the underlying patterns and to generalize well to unseen data.
|
|
In few-shot learning the model has to generalize from just a few samples.
|
|
|
|
=== Patchcore
|
|
// https://arxiv.org/pdf/2106.08265
|
|
PatchCore is an advanced method designed for cold-start anomaly detection and localization, primarily focused on industrial image data.
|
|
It operates on the principle that an image is anomalous if any of its patches is anomalous.
|
|
The method achieves state-of-the-art performance on benchmarks like MVTec AD with high accuracy, low computational cost, and competitive inference times. #cite(<patchcorepaper>)
|
|
#todo[Absatz umformulieren und vereinfachen]
|
|
|
|
The PatchCore framework leverages a pre-trained convolutional neural network (e.g., WideResNet50) to extract mid-level features from image patches.
|
|
By focusing on intermediate layers, PatchCore balances the retention of localized information with a reduction in bias associated with high-level features pre-trained on ImageNet.
|
|
To enhance robustness to spatial variations, the method aggregates features from local neighborhoods using adaptive pooling, which increases the receptive field without sacrificing spatial resolution. #cite(<patchcorepaper>)
|
|
|
|
A crucial component of PatchCore is its memory bank, which stores patch-level features derived from the training dataset.
|
|
This memory bank represents the nominal distribution of features against which test patches are compared.
|
|
To ensure computational efficiency and scalability, PatchCore employs a coreset reduction technique to condense the memory bank by selecting the most representative patch features.
|
|
This optimization reduces both storage requirements and inference times while maintaining the integrity of the feature space. #cite(<patchcorepaper>)
|
|
#todo[reference to image below]
|
|
|
|
During inference, PatchCore computes anomaly scores by measuring the distance between patch features from test images and their nearest neighbors in the memory bank.
|
|
If any patch exhibits a significant deviation, the corresponding image is flagged as anomalous.
|
|
For localization, the anomaly scores of individual patches are spatially aligned and upsampled to generate segmentation maps, providing pixel-level insights into the anomalous regions.~#cite(<patchcorepaper>)
|
|
|
|
|
|
Patchcore reaches a 99.6% AUROC on the MVTec AD dataset when detecting anomalies.
|
|
A great advantage of this method is the coreset subsampling reducing the memory bank size significantly.
|
|
This lowers computational costs while maintaining detection accuracy.~#cite(<patchcorepaper>)
|
|
|
|
#figure(
|
|
image("rsc/patchcore_overview.png", width: 80%),
|
|
caption: [Architecture of Patchcore. #cite(<patchcorepaper>)],
|
|
) <patchcoreoverview>
|
|
|
|
=== EfficientAD
|
|
// https://arxiv.org/pdf/2303.14535
|
|
EfficientAD is another state of the art method for anomaly detection.
|
|
It focuses on maintining performance as well as high computational efficiency.
|
|
At its core, EfficientAD uses a lightweight feature extractor, the Patch Description Network (PDN), which processes images in less than a millisecond on modern hardware.
|
|
In comparison to Patchcore which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convulutional layers and two pooling layers.
|
|
This results in reduced latency while retains the ability to generate patch-level features.~#cite(<efficientADpaper>)
|
|
#todo[reference to image below]
|
|
|
|
The detection of anomalies is achieved through a student-teacher framework.
|
|
The teacher network is a PDN and pre-trained on normal (good) images and the student network is trained to predict the teachers output.
|
|
An anomalie is identified when the student failes to replicate the teachers output.
|
|
This works because of the abscence of anomalies in the training data and the student network has never seen an anomaly while training.
|
|
A special loss function helps the student network not to generalize too broadly and inadequatly learn to predict anomalous features.~#cite(<efficientADpaper>)
|
|
|
|
Additionally to this structural anomaly detection EfficientAD can also address logical anomalies, such as violations in spartial or contextual constraints (eg. object wrong arrangments).
|
|
This is done by the integration of an autoencoder trained to replicate the teacher's features.~#cite(<efficientADpaper>)
|
|
|
|
By comparing the outputs of the autoencdoer and the student logical anomalies are effectively detected.
|
|
This is a challenge that Patchcore does not directly address.~#cite(<efficientADpaper>)
|
|
#todo[maybe add key advantages such as low computational cost and high performance]
|
|
|
|
|
|
#figure(
|
|
image("rsc/efficientad_overview.png", width: 80%),
|
|
caption: [Architecture of EfficientAD. #cite(<efficientADpaper>)],
|
|
) <efficientadoverview>
|
|
|
|
=== Jupyter Notebook
|
|
|
|
A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
|
|
The notebook along with the editor provides a environment for fast prototyping and data analysis.
|
|
It is widely used in the data science, mathematics and machine learning community.
|
|
|
|
In the context of this bachelor thesis it was used to test and evaluate the three few-shot learning methods and to compare them. #cite(<jupyter>)
|
|
|
|
=== CNN
|
|
Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.
|
|
A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.
|
|
Convolutional layers are a set of learnable kernels (filters).
|
|
Each filter performs a convolution operation by sliding a window over every pixel of the image.
|
|
On each pixel a dot product creates a feature map.
|
|
Convolutional layers capture features like edges, textures or shapes.
|
|
Pooling layers sample down the feature maps created by the convolutional layers.
|
|
This helps reducing the computational complexity of the overall network and help with overfitting.
|
|
Common pooling layers include average- and max pooling.
|
|
Finally, after some convolution layers the feature map is flattened and passed to a network of fully connected layers to perform a classification or regression task.
|
|
@cnnarchitecture shows a typical binary classification task.~#cite(<cnnintro>)
|
|
|
|
#figure(
|
|
image("rsc/cnn_architecture.png", width: 80%),
|
|
caption: [Architecture convolutional neural network. #cite(<cnnarchitectureimg>)],
|
|
) <cnnarchitecture>
|
|
|
|
=== RESNet
|
|
|
|
Residual neural networks are a special type of neural network architecture.
|
|
They are especially good for deep learning and have been used in many state-of-the-art computer vision tasks.
|
|
The main idea behind ResNet is the skip connection.
|
|
The skip connection is a direct connection from one layer to another layer which is not the next layer.
|
|
This helps to avoid the vanishing gradient problem and helps with the training of very deep networks.
|
|
ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
|
|
There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. #cite(<resnet>)
|
|
|
|
For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
|
|
|
|
|
|
=== CAML
|
|
// https://arxiv.org/pdf/2310.10971v2
|
|
CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning.
|
|
#todo[Here we should describe in detail how caml works]
|
|
|
|
#figure(
|
|
image("rsc/caml_architecture.png", width: 80%),
|
|
caption: [Architecture of CAML. #cite(<caml_paper>)],
|
|
) <camlarchitecture>
|
|
|
|
=== P$>$M$>$F
|
|
Todo
|
|
|
|
=== Softmax
|
|
#todo[Maybe remove this section]
|
|
The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.
|
|
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
|
|
|
|
$
|
|
sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}
|
|
$ <softmax>
|
|
|
|
The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19th century #cite(<Boltzmann>).
|
|
|
|
|
|
=== Cross Entropy Loss
|
|
#todo[Maybe remove this section]
|
|
Cross Entropy Loss is a well established loss function in machine learning.
|
|
Equation~\eqref{eq:crelformal}\cite{crossentropy} shows the formal general definition of the Cross Entropy Loss.
|
|
And equation~\eqref{eq:crelbinary} is the special case of the general Cross Entropy Loss for binary classification tasks.
|
|
|
|
$
|
|
H(p,q) &= -sum_(x in cal(X)) p(x) log q(x)\
|
|
H(p,q) &= -(p log(q) + (1-p) log(1-q))\
|
|
cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i))
|
|
$
|
|
|
|
Equation~$cal(L)(p,q)$~\eqref{eq:crelbinarybatch}\cite{handsonaiI} is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
|
|
|
|
=== Cosine Similarity
|
|
|
|
=== Euclidean Distance
|
|
|
|
== Alternative Methods
|
|
|
|
There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
|