fix stefan suggestion
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 10s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 10s
This commit is contained in:
parent
93289a17f7
commit
0da616107f
@ -100,11 +100,11 @@ In typical supervised learning the model sees thousands or millions of samples o
|
||||
This helps the model to learn the underlying patterns and to generalize well to unseen data.
|
||||
In few-shot learning the model has to generalize from just a few samples.#todo[Write more about. eg. class distributions]
|
||||
@Goodfellow-et-al-2016
|
||||
|
||||
/*
|
||||
=== Softmax
|
||||
#todo[Maybe remove this section]
|
||||
The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.
|
||||
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
|
||||
It is a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
|
||||
|
||||
$
|
||||
sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}
|
||||
@ -126,7 +126,7 @@ cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i)) #<cre
|
||||
$ <crel>
|
||||
|
||||
Equation~$cal(L)(p,q)$ @crelbatched #cite(<handsonaiI>) is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
|
||||
|
||||
*/
|
||||
=== Cosine Similarity
|
||||
Cosine similarity is a widely used metric for measuring the similarity between two vectors. (@cosinesimilarity).
|
||||
It computes the cosine of the angle between the vectors, offering a measure of their alignment.
|
||||
@ -184,10 +184,10 @@ This lowers computational costs while maintaining detection accuracy.~#cite(<pat
|
||||
=== EfficientAD
|
||||
// https://arxiv.org/pdf/2303.14535
|
||||
EfficientAD is another state of the art method for anomaly detection.
|
||||
It focuses on maintining performance as well as high computational efficiency.
|
||||
It focuses on maintaining performance as well as high computational efficiency.
|
||||
At its core, EfficientAD uses a lightweight feature extractor, the Patch Description Network (PDN), which processes images in less than a millisecond on modern hardware.
|
||||
In comparison to Patchcore which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convulutional layers and two pooling layers.
|
||||
This results in reduced latency while retains the ability to generate patch-level features.~#cite(<efficientADpaper>)
|
||||
In comparison to Patchcore, which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convulutional layers and two pooling layers.
|
||||
This results in reduced latency while retaining the ability to generate patch-level features.~#cite(<efficientADpaper>)
|
||||
#todo[reference to image below]
|
||||
|
||||
The detection of anomalies is achieved through a student-teacher framework.
|
||||
@ -196,10 +196,10 @@ An anomalie is identified when the student failes to replicate the teachers outp
|
||||
This works because of the abscence of anomalies in the training data and the student network has never seen an anomaly while training.
|
||||
A special loss function helps the student network not to generalize too broadly and inadequatly learn to predict anomalous features.~#cite(<efficientADpaper>)
|
||||
|
||||
Additionally to this structural anomaly detection EfficientAD can also address logical anomalies, such as violations in spartial or contextual constraints (eg. object wrong arrangments).
|
||||
Additionally to this structural anomaly detection, EfficientAD can also address logical anomalies, such as violations in spartial or contextual constraints (eg. object wrong arrangments).
|
||||
This is done by the integration of an autoencoder trained to replicate the teacher's features.~#cite(<efficientADpaper>)
|
||||
|
||||
By comparing the outputs of the autoencdoer and the student logical anomalies are effectively detected.
|
||||
By comparing the outputs of the autoencoder and the student logical anomalies are effectively detected.
|
||||
This is a challenge that Patchcore does not directly address.~#cite(<efficientADpaper>)
|
||||
#todo[maybe add key advantages such as low computational cost and high performance]
|
||||
|
||||
@ -212,7 +212,7 @@ This is a challenge that Patchcore does not directly address.~#cite(<efficientAD
|
||||
=== Jupyter Notebook
|
||||
|
||||
A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
|
||||
The notebook along with the editor provides a environment for fast prototyping and data analysis.
|
||||
The notebook along with the editor provides an environment for fast prototyping and data analysis.
|
||||
It is widely used in the data science, mathematics and machine learning community.~#cite(<jupyter>)
|
||||
|
||||
In the context of this bachelor thesis it was used to test and evaluate the three few-shot learning methods and to compare them.
|
||||
@ -245,21 +245,21 @@ This helps to avoid the vanishing gradient problem and helps with the training o
|
||||
ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
|
||||
There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. #cite(<resnet>)
|
||||
|
||||
For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
|
||||
For this bachelor thesis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
|
||||
|
||||
=== P$>$M$>$F
|
||||
// https://arxiv.org/pdf/2204.07305
|
||||
P>P>F (Pre-training > Meta-training > Fine-tuning) is a three-stage pipelined designed for few-shot learning.
|
||||
P>P>F (Pre-training > Meta-training > Fine-tuning) is a three-stage pipeline designed for few-shot learning.
|
||||
It focuses on simplicity but still achieves competitive performance.
|
||||
The three stages convert a general feature extractor into a task-specific model through fine-tuned optimization.
|
||||
#cite(<pmfpaper>)
|
||||
|
||||
*Pre-training:*
|
||||
The first stage in @pmfarchitecture initializes the backbone feature extractor.
|
||||
This can be for instance as ResNet or ViT and is learned by self-supervised techniques.
|
||||
This backbone is traned on large scale datasets on a general domain such as ImageNet or similar.
|
||||
This can be for instance a ResNet or ViT and is learned by self-supervised techniques.
|
||||
This backbone is trained on large scale datasets on a general domain such as ImageNet or similar.
|
||||
This step optimizes for robust feature extractions and builds a foundation model.
|
||||
There are well established bethods for pretraining which can be used such as DINO (self-supervised consistency), CLIP (Image-text alignment) or BERT (for text data).
|
||||
There are well established methods for pretraining which can be used such as DINO (self-supervised consistency), CLIP (Image-text alignment) or BERT (for text data).
|
||||
#cite(<pmfpaper>)
|
||||
|
||||
*Meta-training:*
|
||||
@ -275,15 +275,15 @@ $
|
||||
$
|
||||
|
||||
As a distance metric $d$ a cosine similarity is used. See @cosinesimilarity for the formula.
|
||||
$c_k$, the prototy of a class is defined as $c_k = 1/N_k sum_(i:y_i=k) f(x_i)$ and $N_k$ is just the number of samples of class $k$.
|
||||
$c_k$, the prototype of a class is defined as $c_k = 1/N_k sum_(i:y_i=k) f(x_i)$ and $N_k$ is just the number of samples of class $k$.
|
||||
The meta-training process is dataset-agnostic, allowing for flexible adaptation to various few-shot classification scenarios.#cite(<pmfpaper>)
|
||||
|
||||
*Fine-tuning:*
|
||||
If an novel task is drawn from an unseen domain the model may fail to generalize because of a significant fail in the distribution.
|
||||
If a novel task is drawn from an unseen domain the model may fail to generalize because of a significant fail in the distribution.
|
||||
To overcome this the model is optionally fine-tuned with the support set on a few gradient steps.
|
||||
Data augmentation is used to generate a pseudo query set.
|
||||
With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set.
|
||||
With the loss of this steps the whole model is fine-tuned to the new domain.~#cite(<pmfpaper>)
|
||||
With the loss of this step the whole model is fine-tuned to the new domain.~#cite(<pmfpaper>)
|
||||
|
||||
#figure(
|
||||
image("rsc/pmfarchitecture.png", width: 100%),
|
||||
@ -302,8 +302,8 @@ The inclusion of fine-tuning enhances adaptability to unseen domains, ensuring r
|
||||
|
||||
*Limitations and Scalability:*
|
||||
This method has some limitations.
|
||||
It relies on domains with large external datasets, which require substantial computational computation resources to create pre-trained models.
|
||||
Fine-tuning is effective but might be slow and not work well on devices with limited ocmputational resources.
|
||||
It relies on domains with large external datasets and it requires substantial computational resources to create pre-trained models.
|
||||
Fine-tuning is effective but might be slow and not work well on devices with limited computationsl resources.
|
||||
Future research could focus on exploring faster and more efficient methods for fine-tuning models.
|
||||
#cite(<pmfpaper>)
|
||||
|
||||
@ -315,7 +315,7 @@ This is a universal meta-learning approach.
|
||||
That means no fine-tuning or meta-training is applied for specific domains.~#cite(<caml_paper>)
|
||||
|
||||
*Architecture:*
|
||||
CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture.
|
||||
CAML first encodes the query and support set images using the frozen pre-trained feature extractor as shown in @camlarchitecture.
|
||||
This step brings the images into a low dimensional space where similar images are encoded into similar embeddings.
|
||||
The class labels are encoded with the ELMES class encoder.
|
||||
Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder.
|
||||
@ -334,7 +334,7 @@ This encoder maximizes the algorithms ability to distinguish between different c
|
||||
|
||||
*Non-causal sequence model:*
|
||||
The sequence created by the ELMES encoder is then fed into a non-causal sequence model.
|
||||
This might be for instance a transormer encoder.
|
||||
This might be for instance a transformer encoder.
|
||||
This step conditions the input sequence consisting of the query and support set embeddings.
|
||||
Visual features from query and support set can be compared to each other to determine specific informations such as content or textures.
|
||||
This can then be used to predict the class of the query image.
|
||||
|
Loading…
x
Reference in New Issue
Block a user