fix stefan suggestion
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 10s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 10s
This commit is contained in:
parent
93289a17f7
commit
0da616107f
@ -100,11 +100,11 @@ In typical supervised learning the model sees thousands or millions of samples o
|
|||||||
This helps the model to learn the underlying patterns and to generalize well to unseen data.
|
This helps the model to learn the underlying patterns and to generalize well to unseen data.
|
||||||
In few-shot learning the model has to generalize from just a few samples.#todo[Write more about. eg. class distributions]
|
In few-shot learning the model has to generalize from just a few samples.#todo[Write more about. eg. class distributions]
|
||||||
@Goodfellow-et-al-2016
|
@Goodfellow-et-al-2016
|
||||||
|
/*
|
||||||
=== Softmax
|
=== Softmax
|
||||||
#todo[Maybe remove this section]
|
#todo[Maybe remove this section]
|
||||||
The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.
|
The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.
|
||||||
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
|
It is a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
|
||||||
|
|
||||||
$
|
$
|
||||||
sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}
|
sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}
|
||||||
@ -126,7 +126,7 @@ cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i)) #<cre
|
|||||||
$ <crel>
|
$ <crel>
|
||||||
|
|
||||||
Equation~$cal(L)(p,q)$ @crelbatched #cite(<handsonaiI>) is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
|
Equation~$cal(L)(p,q)$ @crelbatched #cite(<handsonaiI>) is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
|
||||||
|
*/
|
||||||
=== Cosine Similarity
|
=== Cosine Similarity
|
||||||
Cosine similarity is a widely used metric for measuring the similarity between two vectors. (@cosinesimilarity).
|
Cosine similarity is a widely used metric for measuring the similarity between two vectors. (@cosinesimilarity).
|
||||||
It computes the cosine of the angle between the vectors, offering a measure of their alignment.
|
It computes the cosine of the angle between the vectors, offering a measure of their alignment.
|
||||||
@ -184,10 +184,10 @@ This lowers computational costs while maintaining detection accuracy.~#cite(<pat
|
|||||||
=== EfficientAD
|
=== EfficientAD
|
||||||
// https://arxiv.org/pdf/2303.14535
|
// https://arxiv.org/pdf/2303.14535
|
||||||
EfficientAD is another state of the art method for anomaly detection.
|
EfficientAD is another state of the art method for anomaly detection.
|
||||||
It focuses on maintining performance as well as high computational efficiency.
|
It focuses on maintaining performance as well as high computational efficiency.
|
||||||
At its core, EfficientAD uses a lightweight feature extractor, the Patch Description Network (PDN), which processes images in less than a millisecond on modern hardware.
|
At its core, EfficientAD uses a lightweight feature extractor, the Patch Description Network (PDN), which processes images in less than a millisecond on modern hardware.
|
||||||
In comparison to Patchcore which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convulutional layers and two pooling layers.
|
In comparison to Patchcore, which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convulutional layers and two pooling layers.
|
||||||
This results in reduced latency while retains the ability to generate patch-level features.~#cite(<efficientADpaper>)
|
This results in reduced latency while retaining the ability to generate patch-level features.~#cite(<efficientADpaper>)
|
||||||
#todo[reference to image below]
|
#todo[reference to image below]
|
||||||
|
|
||||||
The detection of anomalies is achieved through a student-teacher framework.
|
The detection of anomalies is achieved through a student-teacher framework.
|
||||||
@ -196,10 +196,10 @@ An anomalie is identified when the student failes to replicate the teachers outp
|
|||||||
This works because of the abscence of anomalies in the training data and the student network has never seen an anomaly while training.
|
This works because of the abscence of anomalies in the training data and the student network has never seen an anomaly while training.
|
||||||
A special loss function helps the student network not to generalize too broadly and inadequatly learn to predict anomalous features.~#cite(<efficientADpaper>)
|
A special loss function helps the student network not to generalize too broadly and inadequatly learn to predict anomalous features.~#cite(<efficientADpaper>)
|
||||||
|
|
||||||
Additionally to this structural anomaly detection EfficientAD can also address logical anomalies, such as violations in spartial or contextual constraints (eg. object wrong arrangments).
|
Additionally to this structural anomaly detection, EfficientAD can also address logical anomalies, such as violations in spartial or contextual constraints (eg. object wrong arrangments).
|
||||||
This is done by the integration of an autoencoder trained to replicate the teacher's features.~#cite(<efficientADpaper>)
|
This is done by the integration of an autoencoder trained to replicate the teacher's features.~#cite(<efficientADpaper>)
|
||||||
|
|
||||||
By comparing the outputs of the autoencdoer and the student logical anomalies are effectively detected.
|
By comparing the outputs of the autoencoder and the student logical anomalies are effectively detected.
|
||||||
This is a challenge that Patchcore does not directly address.~#cite(<efficientADpaper>)
|
This is a challenge that Patchcore does not directly address.~#cite(<efficientADpaper>)
|
||||||
#todo[maybe add key advantages such as low computational cost and high performance]
|
#todo[maybe add key advantages such as low computational cost and high performance]
|
||||||
|
|
||||||
@ -212,7 +212,7 @@ This is a challenge that Patchcore does not directly address.~#cite(<efficientAD
|
|||||||
=== Jupyter Notebook
|
=== Jupyter Notebook
|
||||||
|
|
||||||
A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
|
A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
|
||||||
The notebook along with the editor provides a environment for fast prototyping and data analysis.
|
The notebook along with the editor provides an environment for fast prototyping and data analysis.
|
||||||
It is widely used in the data science, mathematics and machine learning community.~#cite(<jupyter>)
|
It is widely used in the data science, mathematics and machine learning community.~#cite(<jupyter>)
|
||||||
|
|
||||||
In the context of this bachelor thesis it was used to test and evaluate the three few-shot learning methods and to compare them.
|
In the context of this bachelor thesis it was used to test and evaluate the three few-shot learning methods and to compare them.
|
||||||
@ -245,21 +245,21 @@ This helps to avoid the vanishing gradient problem and helps with the training o
|
|||||||
ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
|
ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
|
||||||
There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. #cite(<resnet>)
|
There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. #cite(<resnet>)
|
||||||
|
|
||||||
For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
|
For this bachelor thesis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
|
||||||
|
|
||||||
=== P$>$M$>$F
|
=== P$>$M$>$F
|
||||||
// https://arxiv.org/pdf/2204.07305
|
// https://arxiv.org/pdf/2204.07305
|
||||||
P>P>F (Pre-training > Meta-training > Fine-tuning) is a three-stage pipelined designed for few-shot learning.
|
P>P>F (Pre-training > Meta-training > Fine-tuning) is a three-stage pipeline designed for few-shot learning.
|
||||||
It focuses on simplicity but still achieves competitive performance.
|
It focuses on simplicity but still achieves competitive performance.
|
||||||
The three stages convert a general feature extractor into a task-specific model through fine-tuned optimization.
|
The three stages convert a general feature extractor into a task-specific model through fine-tuned optimization.
|
||||||
#cite(<pmfpaper>)
|
#cite(<pmfpaper>)
|
||||||
|
|
||||||
*Pre-training:*
|
*Pre-training:*
|
||||||
The first stage in @pmfarchitecture initializes the backbone feature extractor.
|
The first stage in @pmfarchitecture initializes the backbone feature extractor.
|
||||||
This can be for instance as ResNet or ViT and is learned by self-supervised techniques.
|
This can be for instance a ResNet or ViT and is learned by self-supervised techniques.
|
||||||
This backbone is traned on large scale datasets on a general domain such as ImageNet or similar.
|
This backbone is trained on large scale datasets on a general domain such as ImageNet or similar.
|
||||||
This step optimizes for robust feature extractions and builds a foundation model.
|
This step optimizes for robust feature extractions and builds a foundation model.
|
||||||
There are well established bethods for pretraining which can be used such as DINO (self-supervised consistency), CLIP (Image-text alignment) or BERT (for text data).
|
There are well established methods for pretraining which can be used such as DINO (self-supervised consistency), CLIP (Image-text alignment) or BERT (for text data).
|
||||||
#cite(<pmfpaper>)
|
#cite(<pmfpaper>)
|
||||||
|
|
||||||
*Meta-training:*
|
*Meta-training:*
|
||||||
@ -275,15 +275,15 @@ $
|
|||||||
$
|
$
|
||||||
|
|
||||||
As a distance metric $d$ a cosine similarity is used. See @cosinesimilarity for the formula.
|
As a distance metric $d$ a cosine similarity is used. See @cosinesimilarity for the formula.
|
||||||
$c_k$, the prototy of a class is defined as $c_k = 1/N_k sum_(i:y_i=k) f(x_i)$ and $N_k$ is just the number of samples of class $k$.
|
$c_k$, the prototype of a class is defined as $c_k = 1/N_k sum_(i:y_i=k) f(x_i)$ and $N_k$ is just the number of samples of class $k$.
|
||||||
The meta-training process is dataset-agnostic, allowing for flexible adaptation to various few-shot classification scenarios.#cite(<pmfpaper>)
|
The meta-training process is dataset-agnostic, allowing for flexible adaptation to various few-shot classification scenarios.#cite(<pmfpaper>)
|
||||||
|
|
||||||
*Fine-tuning:*
|
*Fine-tuning:*
|
||||||
If an novel task is drawn from an unseen domain the model may fail to generalize because of a significant fail in the distribution.
|
If a novel task is drawn from an unseen domain the model may fail to generalize because of a significant fail in the distribution.
|
||||||
To overcome this the model is optionally fine-tuned with the support set on a few gradient steps.
|
To overcome this the model is optionally fine-tuned with the support set on a few gradient steps.
|
||||||
Data augmentation is used to generate a pseudo query set.
|
Data augmentation is used to generate a pseudo query set.
|
||||||
With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set.
|
With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set.
|
||||||
With the loss of this steps the whole model is fine-tuned to the new domain.~#cite(<pmfpaper>)
|
With the loss of this step the whole model is fine-tuned to the new domain.~#cite(<pmfpaper>)
|
||||||
|
|
||||||
#figure(
|
#figure(
|
||||||
image("rsc/pmfarchitecture.png", width: 100%),
|
image("rsc/pmfarchitecture.png", width: 100%),
|
||||||
@ -302,8 +302,8 @@ The inclusion of fine-tuning enhances adaptability to unseen domains, ensuring r
|
|||||||
|
|
||||||
*Limitations and Scalability:*
|
*Limitations and Scalability:*
|
||||||
This method has some limitations.
|
This method has some limitations.
|
||||||
It relies on domains with large external datasets, which require substantial computational computation resources to create pre-trained models.
|
It relies on domains with large external datasets and it requires substantial computational resources to create pre-trained models.
|
||||||
Fine-tuning is effective but might be slow and not work well on devices with limited ocmputational resources.
|
Fine-tuning is effective but might be slow and not work well on devices with limited computationsl resources.
|
||||||
Future research could focus on exploring faster and more efficient methods for fine-tuning models.
|
Future research could focus on exploring faster and more efficient methods for fine-tuning models.
|
||||||
#cite(<pmfpaper>)
|
#cite(<pmfpaper>)
|
||||||
|
|
||||||
@ -315,7 +315,7 @@ This is a universal meta-learning approach.
|
|||||||
That means no fine-tuning or meta-training is applied for specific domains.~#cite(<caml_paper>)
|
That means no fine-tuning or meta-training is applied for specific domains.~#cite(<caml_paper>)
|
||||||
|
|
||||||
*Architecture:*
|
*Architecture:*
|
||||||
CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture.
|
CAML first encodes the query and support set images using the frozen pre-trained feature extractor as shown in @camlarchitecture.
|
||||||
This step brings the images into a low dimensional space where similar images are encoded into similar embeddings.
|
This step brings the images into a low dimensional space where similar images are encoded into similar embeddings.
|
||||||
The class labels are encoded with the ELMES class encoder.
|
The class labels are encoded with the ELMES class encoder.
|
||||||
Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder.
|
Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder.
|
||||||
@ -334,7 +334,7 @@ This encoder maximizes the algorithms ability to distinguish between different c
|
|||||||
|
|
||||||
*Non-causal sequence model:*
|
*Non-causal sequence model:*
|
||||||
The sequence created by the ELMES encoder is then fed into a non-causal sequence model.
|
The sequence created by the ELMES encoder is then fed into a non-causal sequence model.
|
||||||
This might be for instance a transormer encoder.
|
This might be for instance a transformer encoder.
|
||||||
This step conditions the input sequence consisting of the query and support set embeddings.
|
This step conditions the input sequence consisting of the query and support set embeddings.
|
||||||
Visual features from query and support set can be compared to each other to determine specific informations such as content or textures.
|
Visual features from query and support set can be compared to each other to determine specific informations such as content or textures.
|
||||||
This can then be used to predict the class of the query image.
|
This can then be used to predict the class of the query image.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user