fix stefan suggestion
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 10s

This commit is contained in:
lukas-heiligenbrunner 2025-01-24 20:10:34 +01:00
parent 93289a17f7
commit 0da616107f

View File

@ -100,11 +100,11 @@ In typical supervised learning the model sees thousands or millions of samples o
This helps the model to learn the underlying patterns and to generalize well to unseen data.
In few-shot learning the model has to generalize from just a few samples.#todo[Write more about. eg. class distributions]
@Goodfellow-et-al-2016
/*
=== Softmax
#todo[Maybe remove this section]
The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
It is a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
$
sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}
@ -126,7 +126,7 @@ cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i)) #<cre
$ <crel>
Equation~$cal(L)(p,q)$ @crelbatched #cite(<handsonaiI>) is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
*/
=== Cosine Similarity
Cosine similarity is a widely used metric for measuring the similarity between two vectors. (@cosinesimilarity).
It computes the cosine of the angle between the vectors, offering a measure of their alignment.
@ -184,10 +184,10 @@ This lowers computational costs while maintaining detection accuracy.~#cite(<pat
=== EfficientAD
// https://arxiv.org/pdf/2303.14535
EfficientAD is another state of the art method for anomaly detection.
It focuses on maintining performance as well as high computational efficiency.
It focuses on maintaining performance as well as high computational efficiency.
At its core, EfficientAD uses a lightweight feature extractor, the Patch Description Network (PDN), which processes images in less than a millisecond on modern hardware.
In comparison to Patchcore which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convulutional layers and two pooling layers.
This results in reduced latency while retains the ability to generate patch-level features.~#cite(<efficientADpaper>)
In comparison to Patchcore, which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convulutional layers and two pooling layers.
This results in reduced latency while retaining the ability to generate patch-level features.~#cite(<efficientADpaper>)
#todo[reference to image below]
The detection of anomalies is achieved through a student-teacher framework.
@ -196,10 +196,10 @@ An anomalie is identified when the student failes to replicate the teachers outp
This works because of the abscence of anomalies in the training data and the student network has never seen an anomaly while training.
A special loss function helps the student network not to generalize too broadly and inadequatly learn to predict anomalous features.~#cite(<efficientADpaper>)
Additionally to this structural anomaly detection EfficientAD can also address logical anomalies, such as violations in spartial or contextual constraints (eg. object wrong arrangments).
Additionally to this structural anomaly detection, EfficientAD can also address logical anomalies, such as violations in spartial or contextual constraints (eg. object wrong arrangments).
This is done by the integration of an autoencoder trained to replicate the teacher's features.~#cite(<efficientADpaper>)
By comparing the outputs of the autoencdoer and the student logical anomalies are effectively detected.
By comparing the outputs of the autoencoder and the student logical anomalies are effectively detected.
This is a challenge that Patchcore does not directly address.~#cite(<efficientADpaper>)
#todo[maybe add key advantages such as low computational cost and high performance]
@ -212,7 +212,7 @@ This is a challenge that Patchcore does not directly address.~#cite(<efficientAD
=== Jupyter Notebook
A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
The notebook along with the editor provides a environment for fast prototyping and data analysis.
The notebook along with the editor provides an environment for fast prototyping and data analysis.
It is widely used in the data science, mathematics and machine learning community.~#cite(<jupyter>)
In the context of this bachelor thesis it was used to test and evaluate the three few-shot learning methods and to compare them.
@ -245,21 +245,21 @@ This helps to avoid the vanishing gradient problem and helps with the training o
ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. #cite(<resnet>)
For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
For this bachelor thesis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
=== P$>$M$>$F
// https://arxiv.org/pdf/2204.07305
P>P>F (Pre-training > Meta-training > Fine-tuning) is a three-stage pipelined designed for few-shot learning.
P>P>F (Pre-training > Meta-training > Fine-tuning) is a three-stage pipeline designed for few-shot learning.
It focuses on simplicity but still achieves competitive performance.
The three stages convert a general feature extractor into a task-specific model through fine-tuned optimization.
#cite(<pmfpaper>)
*Pre-training:*
The first stage in @pmfarchitecture initializes the backbone feature extractor.
This can be for instance as ResNet or ViT and is learned by self-supervised techniques.
This backbone is traned on large scale datasets on a general domain such as ImageNet or similar.
This can be for instance a ResNet or ViT and is learned by self-supervised techniques.
This backbone is trained on large scale datasets on a general domain such as ImageNet or similar.
This step optimizes for robust feature extractions and builds a foundation model.
There are well established bethods for pretraining which can be used such as DINO (self-supervised consistency), CLIP (Image-text alignment) or BERT (for text data).
There are well established methods for pretraining which can be used such as DINO (self-supervised consistency), CLIP (Image-text alignment) or BERT (for text data).
#cite(<pmfpaper>)
*Meta-training:*
@ -275,15 +275,15 @@ $
$
As a distance metric $d$ a cosine similarity is used. See @cosinesimilarity for the formula.
$c_k$, the prototy of a class is defined as $c_k = 1/N_k sum_(i:y_i=k) f(x_i)$ and $N_k$ is just the number of samples of class $k$.
$c_k$, the prototype of a class is defined as $c_k = 1/N_k sum_(i:y_i=k) f(x_i)$ and $N_k$ is just the number of samples of class $k$.
The meta-training process is dataset-agnostic, allowing for flexible adaptation to various few-shot classification scenarios.#cite(<pmfpaper>)
*Fine-tuning:*
If an novel task is drawn from an unseen domain the model may fail to generalize because of a significant fail in the distribution.
If a novel task is drawn from an unseen domain the model may fail to generalize because of a significant fail in the distribution.
To overcome this the model is optionally fine-tuned with the support set on a few gradient steps.
Data augmentation is used to generate a pseudo query set.
With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set.
With the loss of this steps the whole model is fine-tuned to the new domain.~#cite(<pmfpaper>)
With the loss of this step the whole model is fine-tuned to the new domain.~#cite(<pmfpaper>)
#figure(
image("rsc/pmfarchitecture.png", width: 100%),
@ -302,8 +302,8 @@ The inclusion of fine-tuning enhances adaptability to unseen domains, ensuring r
*Limitations and Scalability:*
This method has some limitations.
It relies on domains with large external datasets, which require substantial computational computation resources to create pre-trained models.
Fine-tuning is effective but might be slow and not work well on devices with limited ocmputational resources.
It relies on domains with large external datasets and it requires substantial computational resources to create pre-trained models.
Fine-tuning is effective but might be slow and not work well on devices with limited computationsl resources.
Future research could focus on exploring faster and more efficient methods for fine-tuning models.
#cite(<pmfpaper>)
@ -315,7 +315,7 @@ This is a universal meta-learning approach.
That means no fine-tuning or meta-training is applied for specific domains.~#cite(<caml_paper>)
*Architecture:*
CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture.
CAML first encodes the query and support set images using the frozen pre-trained feature extractor as shown in @camlarchitecture.
This step brings the images into a low dimensional space where similar images are encoded into similar embeddings.
The class labels are encoded with the ELMES class encoder.
Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder.
@ -334,7 +334,7 @@ This encoder maximizes the algorithms ability to distinguish between different c
*Non-causal sequence model:*
The sequence created by the ELMES encoder is then fed into a non-causal sequence model.
This might be for instance a transormer encoder.
This might be for instance a transformer encoder.
This step conditions the input sequence consisting of the query and support set embeddings.
Visual features from query and support set can be compared to each other to determine specific informations such as content or textures.
This can then be used to predict the class of the query image.