92 lines
5.0 KiB
TeX
92 lines
5.0 KiB
TeX
|
\section{Material and Methods}\label{sec:material-and-methods}
|
||
|
|
||
|
\subsection{Material}\label{subsec:material}
|
||
|
|
||
|
\subsubsection{MVTec AD}\label{subsubsec:mvtecad}
|
||
|
MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection.
|
||
|
It contains over 5000 high-resolution images divided into fifteen different object and texture categories.
|
||
|
Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects.
|
||
|
|
||
|
% todo source for https://www.mvtec.com/company/research/datasets/mvtec-ad
|
||
|
|
||
|
% todo example image
|
||
|
%\begin{figure}
|
||
|
% \centering
|
||
|
% \includegraphics[width=\linewidth/2]{../rsc/muffin_chiauaua_poster}
|
||
|
% \caption{Sample images from dataset. \cite{muffinsvschiuahuakaggle_poster}}
|
||
|
% \label{fig:roc-example}
|
||
|
%\end{figure}
|
||
|
|
||
|
|
||
|
\subsection{Methods}\label{subsec:methods}
|
||
|
|
||
|
\subsubsection{Dagster}
|
||
|
\subsubsection{Label-Studio}
|
||
|
|
||
|
\subsubsection{Jupyter Notebook}\label{subsubsec:jupyternb}
|
||
|
|
||
|
A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
|
||
|
The notebook along with the editor provides a environment for fast prototyping and data analysis.
|
||
|
It is widely used in the data science, mathematics and machine learning community.
|
||
|
|
||
|
In the context of this practical work it can be used to test and evaluate the active learning loop before implementing it in a Dagster pipeline. \cite{jupyter}
|
||
|
|
||
|
\subsubsection{CNN}
|
||
|
Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.
|
||
|
A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.
|
||
|
Convolutional layers are a set of learnable kernels (filters).
|
||
|
Each filter performs a convolution operation by sliding a window over every pixel of the image.
|
||
|
On each pixel a dot product creates a feature map.
|
||
|
Convolutional layers capture features like edges, textures or shapes.
|
||
|
Pooling layers sample down the feature maps created by the convolutional layers.
|
||
|
This helps reducing the computational complexity of the overall network and help with overfitting.
|
||
|
Common pooling layers include average- and max pooling.
|
||
|
Finally, after some convolution layers the feature map is flattened and passed to a network of fully connected layers to perform a classification or regression task.
|
||
|
Figure~\ref{fig:cnn-architecture} shows a typical binary classification task.
|
||
|
\cite{cnnintro}
|
||
|
|
||
|
\begin{figure}
|
||
|
\centering
|
||
|
\includegraphics[width=\linewidth]{../rsc/cnn_architecture}
|
||
|
\caption{Architecture convolutional neural network. \cite{cnnarchitectureimg}}
|
||
|
\label{fig:cnn-architecture}
|
||
|
\end{figure}
|
||
|
|
||
|
\subsubsection{RESNet}
|
||
|
|
||
|
Residual neural networks are a special type of neural network architecture.
|
||
|
They are especially good for deep learning and have been used in many state-of-the-art computer vision tasks.
|
||
|
The main idea behind ResNet is the skip connection.
|
||
|
The skip connection is a direct connection from one layer to another layer which is not the next layer.
|
||
|
This helps to avoid the vanishing gradient problem and helps with the training of very deep networks.
|
||
|
ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
|
||
|
There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. \cite{resnet}
|
||
|
|
||
|
Since the dataset is relatively small and the two class classification task is relatively easy (for such a large model) the ResNet-18 architecture is used in this practical work.
|
||
|
|
||
|
\subsubsection{Softmax}
|
||
|
|
||
|
The Softmax function~\eqref{eq:softmax}\cite{liang2017soft} converts $n$ numbers of a vector into a probability distribution.
|
||
|
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
|
||
|
\begin{equation}\label{eq:softmax}
|
||
|
\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \; for j\coloneqq\{1,\dots,K\}
|
||
|
\end{equation}
|
||
|
|
||
|
The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19$^{\textrm{th}}$ century~\cite{Boltzmann}.
|
||
|
|
||
|
|
||
|
\subsubsection{Cross Entropy Loss}
|
||
|
Cross Entropy Loss is a well established loss function in machine learning.
|
||
|
Equation~\eqref{eq:crelformal}\cite{crossentropy} shows the formal general definition of the Cross Entropy Loss.
|
||
|
And equation~\eqref{eq:crelbinary} is the special case of the general Cross Entropy Loss for binary classification tasks.
|
||
|
|
||
|
\begin{align}
|
||
|
H(p,q) &= -\sum_{x\in\mathcal{X}} p(x)\, \log q(x)\label{eq:crelformal}\\
|
||
|
H(p,q) &= - (p \log q + (1-p) \log(1-q))\label{eq:crelbinary}\\
|
||
|
\mathcal{L}(p,q) &= - \frac1N \sum_{i=1}^{\mathcal{B}} (p_i \log q_i + (1-p_i) \log(1-q_i))\label{eq:crelbinarybatch}
|
||
|
\end{align}
|
||
|
|
||
|
Equation~$\mathcal{L}(p,q)$~\eqref{eq:crelbinarybatch}\cite{handsonaiI} is the Binary Cross Entropy Loss for a batch of size $\mathcal{B}$ and used for model training in this Practical Work.
|
||
|
|
||
|
\subsubsection{Mathematical modeling of problem}\label{subsubsec:mathematicalmodeling}
|