All checks were successful
		
		
	
	Build Typst document / build_typst_documents (push) Successful in 20s
				
			
		
			
				
	
	
		
			134 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
			
		
		
	
	
			134 lines
		
	
	
		
			6.3 KiB
		
	
	
	
		
			XML
		
	
	
	
	
	
| = Material and Methods
 | |
| 
 | |
| == Material
 | |
| 
 | |
| === MVTec AD
 | |
| MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection.
 | |
| It contains over 5000 high-resolution images divided into fifteen different object and texture categories.
 | |
| Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects.
 | |
| 
 | |
| #figure(
 | |
|   image("rsc/dataset_overview_large.png", width: 80%),
 | |
|   caption: [Architecture convolutional neural network. #cite(<datasetsampleimg>)],
 | |
| ) <datasetoverview>
 | |
| 
 | |
| // todo
 | |
| Todo: descibe which categories are used in this bac and how many samples there are.
 | |
| 
 | |
| == Methods
 | |
| 
 | |
| === Few-Shot Learning
 | |
| Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
 | |
| In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
 | |
| So the model is prone to overfitting to the few training samples.
 | |
| 
 | |
| Typically a few-shot leaning task consists of a support and query set.
 | |
| Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
 | |
| A common way to format a few-shot leaning problem is using n-way k-shot notation.
 | |
| For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.
 | |
| 
 | |
| A classical example of how such a model might work is a prototypical network.
 | |
| These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.
 | |
| 
 | |
| #figure(
 | |
|   image("rsc/prototype_fewshot_v3.png", width: 60%),
 | |
|   caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
 | |
| ) <prototypefewshot>
 | |
| 
 | |
| The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
 | |
| See //%todo link to this section
 | |
| // todo proper source
 | |
| 
 | |
| === Generalisation from few samples
 | |
| 
 | |
| An especially hard task is to generalize from such few samples.
 | |
| In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
 | |
| This helps the model to learn the underlying patterns and to generalize well to unseen data.
 | |
| In few-shot learning the model has to generalize from just a few samples.
 | |
| 
 | |
| === Patchcore
 | |
| 
 | |
| %todo also show values how they perform on MVTec AD
 | |
| 
 | |
| === EfficientAD
 | |
| todo stuff #cite(<patchcorepaper>)
 | |
| // https://arxiv.org/pdf/2106.08265
 | |
| todo stuff #cite(<efficientADpaper>)
 | |
| // https://arxiv.org/pdf/2303.14535
 | |
| 
 | |
| === Jupyter Notebook
 | |
| 
 | |
| A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
 | |
| The notebook along with the editor provides a environment for fast prototyping and data analysis.
 | |
| It is widely used in the data science, mathematics and machine learning community.
 | |
| 
 | |
| In the context of this bachelor thesis it was used to test and evaluate the three few-shot learning methods and to compare them. #cite(<jupyter>)
 | |
| 
 | |
| === CNN
 | |
| Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.
 | |
| A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.
 | |
| Convolutional layers are a set of learnable kernels (filters).
 | |
| Each filter performs a convolution operation by sliding a window over every pixel of the image.
 | |
| On each pixel a dot product creates a feature map.
 | |
| Convolutional layers capture features like edges, textures or shapes.
 | |
| Pooling layers sample down the feature maps created by the convolutional layers.
 | |
| This helps reducing the computational complexity of the overall network and help with overfitting.
 | |
| Common pooling layers include average- and max pooling.
 | |
| Finally, after some convolution layers the feature map is flattened and passed to a network of fully connected layers to perform a classification or regression task.
 | |
| @cnnarchitecture shows a typical binary classification task.
 | |
| #cite(<cnnintro>)
 | |
| 
 | |
| #figure(
 | |
|   image("rsc/cnn_architecture.png", width: 80%),
 | |
|   caption: [Architecture convolutional neural network. #cite(<cnnarchitectureimg>)],
 | |
| ) <cnnarchitecture>
 | |
| 
 | |
| === RESNet
 | |
| 
 | |
| Residual neural networks are a special type of neural network architecture.
 | |
| They are especially good for deep learning and have been used in many state-of-the-art computer vision tasks.
 | |
| The main idea behind ResNet is the skip connection.
 | |
| The skip connection is a direct connection from one layer to another layer which is not the next layer.
 | |
| This helps to avoid the vanishing gradient problem and helps with the training of very deep networks.
 | |
| ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
 | |
| There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. #cite(<resnet>)
 | |
| 
 | |
| For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
 | |
| 
 | |
| 
 | |
| === CAML
 | |
| Todo
 | |
| === P$>$M$>$F
 | |
| Todo
 | |
| 
 | |
| === Softmax
 | |
| 
 | |
| The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.
 | |
| Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
 | |
| 
 | |
| $
 | |
| sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}
 | |
| $ <softmax>
 | |
| 
 | |
| The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19th century #cite(<Boltzmann>).
 | |
| 
 | |
| 
 | |
| === Cross Entropy Loss
 | |
| Cross Entropy Loss is a well established loss function in machine learning.
 | |
| Equation~\eqref{eq:crelformal}\cite{crossentropy} shows the formal general definition of the Cross Entropy Loss.
 | |
| And equation~\eqref{eq:crelbinary} is the special case of the general Cross Entropy Loss for binary classification tasks.
 | |
| 
 | |
| $
 | |
| H(p,q) &= -sum_(x in cal(X)) p(x) log q(x)\
 | |
| H(p,q) &= -(p log(q) + (1-p) log(1-q))\
 | |
| cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i))
 | |
| $
 | |
| 
 | |
| Equation~$cal(L)(p,q)$~\eqref{eq:crelbinarybatch}\cite{handsonaiI} is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
 | |
| 
 | |
| === Mathematical modeling of problem
 | |
| 
 | |
| == Alternative Methods
 | |
| 
 | |
| There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
 |