All checks were successful
Build LaTeX Document / build (push) Successful in 14m26s
123 lines
6.4 KiB
TeX
123 lines
6.4 KiB
TeX
\section{Material and Methods}\label{sec:material-and-methods}
|
|
|
|
\subsection{Material}\label{subsec:material}
|
|
|
|
\subsubsection{MVTec AD}\label{subsubsec:mvtecad}
|
|
MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection.
|
|
It contains over 5000 high-resolution images divided into fifteen different object and texture categories.
|
|
Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects.
|
|
|
|
% todo source for https://www.mvtec.com/company/research/datasets/mvtec-ad
|
|
|
|
% todo example image
|
|
%\begin{figure}
|
|
% \centering
|
|
% \includegraphics[width=\linewidth/2]{../rsc/muffin_chiauaua_poster}
|
|
% \caption{Sample images from dataset. \cite{muffinsvschiuahuakaggle_poster}}
|
|
% \label{fig:roc-example}
|
|
%\end{figure}
|
|
|
|
|
|
\subsection{Methods}\label{subsec:methods}
|
|
|
|
\subsubsection{Few-Shot Learning}
|
|
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
|
|
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
|
|
So the model is prone to overfitting to the few training samples.
|
|
|
|
Typically a few-shot leaning task consists of a support and query set.
|
|
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
|
|
A common way to format a few-shot leaning problem is using n-way k-shot notation.
|
|
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.
|
|
|
|
A classical example of how such a model might work is a prototypical network.
|
|
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.
|
|
|
|
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
|
|
See %todo link to this section
|
|
% todo proper source
|
|
|
|
\subsubsection{Generalisation from few samples}
|
|
|
|
\subsubsection{Patchcore}
|
|
|
|
%todo also show values how they perform on MVTec AD
|
|
|
|
\subsubsection{EfficientAD}
|
|
todo stuff~\cite{patchcorepaper}
|
|
% https://arxiv.org/pdf/2106.08265
|
|
todo stuff\cite{efficientADpaper}
|
|
% https://arxiv.org/pdf/2303.14535
|
|
|
|
\subsubsection{Jupyter Notebook}\label{subsubsec:jupyternb}
|
|
|
|
A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
|
|
The notebook along with the editor provides a environment for fast prototyping and data analysis.
|
|
It is widely used in the data science, mathematics and machine learning community.
|
|
|
|
In the context of this practical work it can be used to test and evaluate the active learning loop before implementing it in a Dagster pipeline. \cite{jupyter}
|
|
|
|
\subsubsection{CNN}
|
|
Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.
|
|
A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.
|
|
Convolutional layers are a set of learnable kernels (filters).
|
|
Each filter performs a convolution operation by sliding a window over every pixel of the image.
|
|
On each pixel a dot product creates a feature map.
|
|
Convolutional layers capture features like edges, textures or shapes.
|
|
Pooling layers sample down the feature maps created by the convolutional layers.
|
|
This helps reducing the computational complexity of the overall network and help with overfitting.
|
|
Common pooling layers include average- and max pooling.
|
|
Finally, after some convolution layers the feature map is flattened and passed to a network of fully connected layers to perform a classification or regression task.
|
|
Figure~\ref{fig:cnn-architecture} shows a typical binary classification task.
|
|
\cite{cnnintro}
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=\linewidth]{../rsc/cnn_architecture}
|
|
\caption{Architecture convolutional neural network. \cite{cnnarchitectureimg}}
|
|
\label{fig:cnn-architecture}
|
|
\end{figure}
|
|
|
|
\subsubsection{RESNet}
|
|
|
|
Residual neural networks are a special type of neural network architecture.
|
|
They are especially good for deep learning and have been used in many state-of-the-art computer vision tasks.
|
|
The main idea behind ResNet is the skip connection.
|
|
The skip connection is a direct connection from one layer to another layer which is not the next layer.
|
|
This helps to avoid the vanishing gradient problem and helps with the training of very deep networks.
|
|
ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
|
|
There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. \cite{resnet}
|
|
|
|
Since the dataset is relatively small and the two class classification task is relatively easy (for such a large model) the ResNet-18 architecture is used in this practical work.
|
|
|
|
\subsubsection{CAML}
|
|
Todo
|
|
\subsubsection{P$>$M$>$F}
|
|
Todo
|
|
|
|
\subsubsection{Softmax}
|
|
|
|
The Softmax function~\eqref{eq:softmax}\cite{liang2017soft} converts $n$ numbers of a vector into a probability distribution.
|
|
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
|
|
\begin{equation}\label{eq:softmax}
|
|
\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \; for j\coloneqq\{1,\dots,K\}
|
|
\end{equation}
|
|
|
|
The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19$^{\textrm{th}}$ century~\cite{Boltzmann}.
|
|
|
|
|
|
\subsubsection{Cross Entropy Loss}
|
|
Cross Entropy Loss is a well established loss function in machine learning.
|
|
Equation~\eqref{eq:crelformal}\cite{crossentropy} shows the formal general definition of the Cross Entropy Loss.
|
|
And equation~\eqref{eq:crelbinary} is the special case of the general Cross Entropy Loss for binary classification tasks.
|
|
|
|
\begin{align}
|
|
H(p,q) &= -\sum_{x\in\mathcal{X}} p(x)\, \log q(x)\label{eq:crelformal}\\
|
|
H(p,q) &= - (p \log q + (1-p) \log(1-q))\label{eq:crelbinary}\\
|
|
\mathcal{L}(p,q) &= - \frac1N \sum_{i=1}^{\mathcal{B}} (p_i \log q_i + (1-p_i) \log(1-q_i))\label{eq:crelbinarybatch}
|
|
\end{align}
|
|
|
|
Equation~$\mathcal{L}(p,q)$~\eqref{eq:crelbinarybatch}\cite{handsonaiI} is the Binary Cross Entropy Loss for a batch of size $\mathcal{B}$ and used for model training in this Practical Work.
|
|
|
|
\subsubsection{Mathematical modeling of problem}\label{subsubsec:mathematicalmodeling}
|