add pmf material section
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 18s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 18s
This commit is contained in:
parent
882c6f54bb
commit
2690a3d0f2
@ -99,6 +99,54 @@ In typical supervised learning the model sees thousands or millions of samples o
|
|||||||
This helps the model to learn the underlying patterns and to generalize well to unseen data.
|
This helps the model to learn the underlying patterns and to generalize well to unseen data.
|
||||||
In few-shot learning the model has to generalize from just a few samples.
|
In few-shot learning the model has to generalize from just a few samples.
|
||||||
|
|
||||||
|
=== Softmax
|
||||||
|
#todo[Maybe remove this section]
|
||||||
|
The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.
|
||||||
|
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
|
||||||
|
|
||||||
|
$
|
||||||
|
sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}
|
||||||
|
$ <softmax>
|
||||||
|
|
||||||
|
The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19th century #cite(<Boltzmann>).
|
||||||
|
|
||||||
|
|
||||||
|
=== Cross Entropy Loss
|
||||||
|
#todo[Maybe remove this section]
|
||||||
|
Cross Entropy Loss is a well established loss function in machine learning.
|
||||||
|
@crelformal #cite(<crossentropy>) shows the formal general definition of the Cross Entropy Loss.
|
||||||
|
And @crelbinary is the special case of the general Cross Entropy Loss for binary classification tasks.
|
||||||
|
|
||||||
|
$
|
||||||
|
H(p,q) &= -sum_(x in cal(X)) p(x) log q(x) #<crelformal>\
|
||||||
|
H(p,q) &= -(p log(q) + (1-p) log(1-q)) #<crelbinary>\
|
||||||
|
cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i)) #<crelbatched>
|
||||||
|
$ <crel>
|
||||||
|
|
||||||
|
Equation~$cal(L)(p,q)$ @crelbatched #cite(<handsonaiI>) is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
|
||||||
|
|
||||||
|
=== Cosine Similarity
|
||||||
|
To measure the distance between two vectors some common distance measures are used.
|
||||||
|
One popular of them is the Cosine Similarity (@cosinesimilarity).
|
||||||
|
It measures the cosine of the angle between two vectors.
|
||||||
|
The Cosine Similarity is especially useful when the magnitude of the vectors is not important.
|
||||||
|
|
||||||
|
$
|
||||||
|
cos(theta) &:= (A dot B) / (||A|| dot ||B||)\
|
||||||
|
&= (sum_(i=1)^n A_i B_i)/ (sqrt(sum_(i=1)^n A_i^2) dot sqrt(sum_(i=1)^n B_i^2))
|
||||||
|
$ <cosinesimilarity>
|
||||||
|
|
||||||
|
#todo[Source?]
|
||||||
|
=== Euclidean Distance
|
||||||
|
The euclidean distance (@euclideannorm) is a simpler method to measure the distance between two points in a vector space.
|
||||||
|
It just calculates the square root of the sum of the squared differences of the coordinates.
|
||||||
|
the euclidean distance can also be represented as the L2 norm (euclidean norm) of the difference of the two vectors.
|
||||||
|
|
||||||
|
$
|
||||||
|
cal(d)(A,B) = ||A-B|| := sqrt(sum_(i=1)^n (A_i - B_i)^2)
|
||||||
|
$ <euclideannorm>
|
||||||
|
#todo[Source?]
|
||||||
|
|
||||||
=== Patchcore
|
=== Patchcore
|
||||||
// https://arxiv.org/pdf/2106.08265
|
// https://arxiv.org/pdf/2106.08265
|
||||||
PatchCore is an advanced method designed for cold-start anomaly detection and localization, primarily focused on industrial image data.
|
PatchCore is an advanced method designed for cold-start anomaly detection and localization, primarily focused on industrial image data.
|
||||||
@ -198,8 +246,63 @@ For this bachelor theis the ResNet-50 architecture was used to predict the corre
|
|||||||
|
|
||||||
=== P$>$M$>$F
|
=== P$>$M$>$F
|
||||||
// https://arxiv.org/pdf/2204.07305
|
// https://arxiv.org/pdf/2204.07305
|
||||||
|
P>P>F (Pre-training > Meta-training > Fine-tuning) is a three-stage pipelined designed for few-shot learning.
|
||||||
|
It focuses on simplicity but still achieves competitive performance.
|
||||||
|
The three stages convert a general feature extractor into a task-specific model through fine-tuned optimization.
|
||||||
|
#cite(<pmfpaper>)
|
||||||
|
|
||||||
#todo[Todo]#cite(<pmfpaper>)
|
*Pre-training:*
|
||||||
|
The first stage in @pmfarchitecture initializes the backbone feature extractor.
|
||||||
|
This can be for instance as ResNet or ViT and is learned by self-supervised techniques.
|
||||||
|
This backbone is traned on large scale datasets on a general domain such as ImageNet or similar.
|
||||||
|
This step optimizes for robust feature extractions and builds a foundation model.
|
||||||
|
There are well established bethods for pretraining which can be used such as DINO (self-supervised consistency), CLIP (Image-text alignment) or BERT (for text data).
|
||||||
|
#cite(<pmfpaper>)
|
||||||
|
|
||||||
|
*Meta-training:*
|
||||||
|
The second stage in the pipline as in @pmfarchitecture is the meta-training.
|
||||||
|
Here a prototypical network (ProtoNet) is used to refine the pre-trained backbone.
|
||||||
|
ProtoNet constructs class centroids for each episode and then performs nearest class centroid classification.
|
||||||
|
Have a look at @prototypefewshot for a visualisation of its architecture.
|
||||||
|
The ProtoNet only requires a backbone $f$ to map images to an m-dimensional vector space: $f: cal(X) -> RR^m$.
|
||||||
|
The probability of a query image $x$ belonging to a class $k$ is given by the $exp$ of the distance of the sample to the class center divided by the sum of all distances:
|
||||||
|
|
||||||
|
$
|
||||||
|
p(y=k|x) = exp(-d(f(x), c_k)) / (sum_(k') exp(-d(f(x), c_k')))#cite(<pmfpaper>)
|
||||||
|
$
|
||||||
|
|
||||||
|
As a distance metric $d$ a cosine similarity is used. See @cosinesimilarity for the formula.
|
||||||
|
$c_k$, the prototy of a class is defined as $c_k = 1/N_k sum_(i:y_i=k) f(x_i)$ and $N_k$ is just the number of samples of class $k$.
|
||||||
|
The meta-training process is dataset-agnostic, allowing for flexible adaptation to various few-shot classification scenarios.#cite(<pmfpaper>)
|
||||||
|
|
||||||
|
*Fine-tuning:*
|
||||||
|
If an novel task is drawn from an unseen domain the model may fail to generalize because of a significant fail in the distribution.
|
||||||
|
To overcome this the model is optionally fine-tuned with the support set on a few gradient steps.
|
||||||
|
Data augmentation is used to generate a pseudo query set.
|
||||||
|
With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set.
|
||||||
|
With the loss of this steps the whole model is fine-tuned to the new domain.~#cite(<pmfpaper>)
|
||||||
|
|
||||||
|
#figure(
|
||||||
|
image("rsc/pmfarchitecture.png", width: 100%),
|
||||||
|
caption: [Architecture of P>M>F. #cite(<pmfpaper>)],
|
||||||
|
) <pmfarchitecture>
|
||||||
|
|
||||||
|
*Inference:*
|
||||||
|
During inference the support set is used to calculate the class prototypes.
|
||||||
|
For a query image the feature extractor extracts its embedding in lower dimensional space and compares it to the pre-computed prototypes.
|
||||||
|
The query image is then assigned to the class with the closest prototype.#cite(<pmfpaper>)
|
||||||
|
|
||||||
|
*Performance:*
|
||||||
|
P>M>F performs well across several few-shot learning benchmarks.
|
||||||
|
The combination of pre-training on large dataset and meta-trainng with episodic tasks helps the model to generalize well.
|
||||||
|
The inclusion of fine-tuning enhances adaptability to unseen domains, ensuring robust and efficient learning.#cite(<pmfpaper>)
|
||||||
|
|
||||||
|
*Limitations and Scalability:*
|
||||||
|
This method has some limitations.
|
||||||
|
It relies on domains with large external datasets, which require substantial computational computation resources to create pre-trained models.
|
||||||
|
Fine-tuning is effective but might be slow and not work well on devices with limited ocmputational resources.
|
||||||
|
Future research could focus on exploring faster and more efficient methods for fine-tuning models.
|
||||||
|
#cite(<pmfpaper>)
|
||||||
|
|
||||||
=== CAML <CAML>
|
=== CAML <CAML>
|
||||||
// https://arxiv.org/pdf/2310.10971v2
|
// https://arxiv.org/pdf/2310.10971v2
|
||||||
@ -268,53 +371,6 @@ Its use of frozen pre-trained feature extractors is key to avoiding overfitting
|
|||||||
caption: [Architecture of CAML. #cite(<caml_paper>)],
|
caption: [Architecture of CAML. #cite(<caml_paper>)],
|
||||||
) <camlarchitecture>
|
) <camlarchitecture>
|
||||||
|
|
||||||
=== Softmax
|
|
||||||
#todo[Maybe remove this section]
|
|
||||||
The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.
|
|
||||||
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
|
|
||||||
|
|
||||||
$
|
|
||||||
sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}
|
|
||||||
$ <softmax>
|
|
||||||
|
|
||||||
The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19th century #cite(<Boltzmann>).
|
|
||||||
|
|
||||||
|
|
||||||
=== Cross Entropy Loss
|
|
||||||
#todo[Maybe remove this section]
|
|
||||||
Cross Entropy Loss is a well established loss function in machine learning.
|
|
||||||
@crelformal #cite(<crossentropy>) shows the formal general definition of the Cross Entropy Loss.
|
|
||||||
And @crelbinary is the special case of the general Cross Entropy Loss for binary classification tasks.
|
|
||||||
|
|
||||||
$
|
|
||||||
H(p,q) &= -sum_(x in cal(X)) p(x) log q(x) #<crelformal>\
|
|
||||||
H(p,q) &= -(p log(q) + (1-p) log(1-q)) #<crelbinary>\
|
|
||||||
cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i)) #<crelbatched>
|
|
||||||
$ <crel>
|
|
||||||
|
|
||||||
Equation~$cal(L)(p,q)$ @crelbatched #cite(<handsonaiI>) is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
|
|
||||||
|
|
||||||
=== Cosine Similarity
|
|
||||||
To measure the distance between two vectors some common distance measures are used.
|
|
||||||
One popular of them is the Cosine Similarity (@cosinesimilarity).
|
|
||||||
It measures the cosine of the angle between two vectors.
|
|
||||||
The Cosine Similarity is especially useful when the magnitude of the vectors is not important.
|
|
||||||
|
|
||||||
$
|
|
||||||
cos(theta) &:= (A dot B) / (||A|| dot ||B||)\
|
|
||||||
&= (sum_(i=1)^n A_i B_i)/ (sqrt(sum_(i=1)^n A_i^2) dot sqrt(sum_(i=1)^n B_i^2))
|
|
||||||
$ <cosinesimilarity>
|
|
||||||
|
|
||||||
#todo[Source?]
|
|
||||||
=== Euclidean Distance
|
|
||||||
The euclidean distance (@euclideannorm) is a simpler method to measure the distance between two points in a vector space.
|
|
||||||
It just calculates the square root of the sum of the squared differences of the coordinates.
|
|
||||||
the euclidean distance can also be represented as the L2 norm (euclidean norm) of the difference of the two vectors.
|
|
||||||
|
|
||||||
$
|
|
||||||
cal(d)(A,B) = ||A-B|| := sqrt(sum_(i=1)^n (A_i - B_i)^2)
|
|
||||||
$ <euclideannorm>
|
|
||||||
#todo[Source?]
|
|
||||||
== Alternative Methods
|
== Alternative Methods
|
||||||
|
|
||||||
There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
|
There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
|
||||||
|
BIN
rsc/pmfarchitecture.png
Normal file
BIN
rsc/pmfarchitecture.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 117 KiB |
Loading…
x
Reference in New Issue
Block a user