diff --git a/materialandmethods.typ b/materialandmethods.typ index 99e0f7d..dec6f42 100644 --- a/materialandmethods.typ +++ b/materialandmethods.typ @@ -99,6 +99,54 @@ In typical supervised learning the model sees thousands or millions of samples o This helps the model to learn the underlying patterns and to generalize well to unseen data. In few-shot learning the model has to generalize from just a few samples. +=== Softmax +#todo[Maybe remove this section] +The Softmax function @softmax #cite() converts $n$ numbers of a vector into a probability distribution. +Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks. + +$ +sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k} +$ + +The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19th century #cite(). + + +=== Cross Entropy Loss +#todo[Maybe remove this section] +Cross Entropy Loss is a well established loss function in machine learning. +@crelformal #cite() shows the formal general definition of the Cross Entropy Loss. +And @crelbinary is the special case of the general Cross Entropy Loss for binary classification tasks. + +$ +H(p,q) &= -sum_(x in cal(X)) p(x) log q(x) #\ +H(p,q) &= -(p log(q) + (1-p) log(1-q)) #\ +cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i)) # +$ + +Equation~$cal(L)(p,q)$ @crelbatched #cite() is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work. + +=== Cosine Similarity +To measure the distance between two vectors some common distance measures are used. +One popular of them is the Cosine Similarity (@cosinesimilarity). +It measures the cosine of the angle between two vectors. +The Cosine Similarity is especially useful when the magnitude of the vectors is not important. + +$ + cos(theta) &:= (A dot B) / (||A|| dot ||B||)\ + &= (sum_(i=1)^n A_i B_i)/ (sqrt(sum_(i=1)^n A_i^2) dot sqrt(sum_(i=1)^n B_i^2)) +$ + +#todo[Source?] +=== Euclidean Distance +The euclidean distance (@euclideannorm) is a simpler method to measure the distance between two points in a vector space. +It just calculates the square root of the sum of the squared differences of the coordinates. +the euclidean distance can also be represented as the L2 norm (euclidean norm) of the difference of the two vectors. + +$ + cal(d)(A,B) = ||A-B|| := sqrt(sum_(i=1)^n (A_i - B_i)^2) +$ +#todo[Source?] + === Patchcore // https://arxiv.org/pdf/2106.08265 PatchCore is an advanced method designed for cold-start anomaly detection and localization, primarily focused on industrial image data. @@ -198,8 +246,63 @@ For this bachelor theis the ResNet-50 architecture was used to predict the corre === P$>$M$>$F // https://arxiv.org/pdf/2204.07305 +P>P>F (Pre-training > Meta-training > Fine-tuning) is a three-stage pipelined designed for few-shot learning. +It focuses on simplicity but still achieves competitive performance. +The three stages convert a general feature extractor into a task-specific model through fine-tuned optimization. +#cite() -#todo[Todo]#cite() +*Pre-training:* +The first stage in @pmfarchitecture initializes the backbone feature extractor. +This can be for instance as ResNet or ViT and is learned by self-supervised techniques. +This backbone is traned on large scale datasets on a general domain such as ImageNet or similar. +This step optimizes for robust feature extractions and builds a foundation model. +There are well established bethods for pretraining which can be used such as DINO (self-supervised consistency), CLIP (Image-text alignment) or BERT (for text data). +#cite() + +*Meta-training:* +The second stage in the pipline as in @pmfarchitecture is the meta-training. +Here a prototypical network (ProtoNet) is used to refine the pre-trained backbone. +ProtoNet constructs class centroids for each episode and then performs nearest class centroid classification. +Have a look at @prototypefewshot for a visualisation of its architecture. +The ProtoNet only requires a backbone $f$ to map images to an m-dimensional vector space: $f: cal(X) -> RR^m$. +The probability of a query image $x$ belonging to a class $k$ is given by the $exp$ of the distance of the sample to the class center divided by the sum of all distances: + +$ + p(y=k|x) = exp(-d(f(x), c_k)) / (sum_(k') exp(-d(f(x), c_k')))#cite() +$ + +As a distance metric $d$ a cosine similarity is used. See @cosinesimilarity for the formula. +$c_k$, the prototy of a class is defined as $c_k = 1/N_k sum_(i:y_i=k) f(x_i)$ and $N_k$ is just the number of samples of class $k$. +The meta-training process is dataset-agnostic, allowing for flexible adaptation to various few-shot classification scenarios.#cite() + +*Fine-tuning:* +If an novel task is drawn from an unseen domain the model may fail to generalize because of a significant fail in the distribution. +To overcome this the model is optionally fine-tuned with the support set on a few gradient steps. +Data augmentation is used to generate a pseudo query set. +With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set. +With the loss of this steps the whole model is fine-tuned to the new domain.~#cite() + +#figure( + image("rsc/pmfarchitecture.png", width: 100%), + caption: [Architecture of P>M>F. #cite()], +) + +*Inference:* +During inference the support set is used to calculate the class prototypes. +For a query image the feature extractor extracts its embedding in lower dimensional space and compares it to the pre-computed prototypes. +The query image is then assigned to the class with the closest prototype.#cite() + +*Performance:* +P>M>F performs well across several few-shot learning benchmarks. +The combination of pre-training on large dataset and meta-trainng with episodic tasks helps the model to generalize well. +The inclusion of fine-tuning enhances adaptability to unseen domains, ensuring robust and efficient learning.#cite() + +*Limitations and Scalability:* +This method has some limitations. +It relies on domains with large external datasets, which require substantial computational computation resources to create pre-trained models. +Fine-tuning is effective but might be slow and not work well on devices with limited ocmputational resources. +Future research could focus on exploring faster and more efficient methods for fine-tuning models. +#cite() === CAML // https://arxiv.org/pdf/2310.10971v2 @@ -268,53 +371,6 @@ Its use of frozen pre-trained feature extractors is key to avoiding overfitting caption: [Architecture of CAML. #cite()], ) -=== Softmax -#todo[Maybe remove this section] -The Softmax function @softmax #cite() converts $n$ numbers of a vector into a probability distribution. -Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks. - -$ -sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k} -$ - -The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19th century #cite(). - - -=== Cross Entropy Loss -#todo[Maybe remove this section] -Cross Entropy Loss is a well established loss function in machine learning. -@crelformal #cite() shows the formal general definition of the Cross Entropy Loss. -And @crelbinary is the special case of the general Cross Entropy Loss for binary classification tasks. - -$ -H(p,q) &= -sum_(x in cal(X)) p(x) log q(x) #\ -H(p,q) &= -(p log(q) + (1-p) log(1-q)) #\ -cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i)) # -$ - -Equation~$cal(L)(p,q)$ @crelbatched #cite() is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work. - -=== Cosine Similarity -To measure the distance between two vectors some common distance measures are used. -One popular of them is the Cosine Similarity (@cosinesimilarity). -It measures the cosine of the angle between two vectors. -The Cosine Similarity is especially useful when the magnitude of the vectors is not important. - -$ - cos(theta) &:= (A dot B) / (||A|| dot ||B||)\ - &= (sum_(i=1)^n A_i B_i)/ (sqrt(sum_(i=1)^n A_i^2) dot sqrt(sum_(i=1)^n B_i^2)) -$ - -#todo[Source?] -=== Euclidean Distance -The euclidean distance (@euclideannorm) is a simpler method to measure the distance between two points in a vector space. -It just calculates the square root of the sum of the squared differences of the coordinates. -the euclidean distance can also be represented as the L2 norm (euclidean norm) of the difference of the two vectors. - -$ - cal(d)(A,B) = ||A-B|| := sqrt(sum_(i=1)^n (A_i - B_i)^2) -$ -#todo[Source?] == Alternative Methods There are several alternative methods to few-shot learning which are not used in this bachelor thesis. diff --git a/rsc/pmfarchitecture.png b/rsc/pmfarchitecture.png new file mode 100644 index 0000000..59c6880 Binary files /dev/null and b/rsc/pmfarchitecture.png differ