PWAI/src/materialandmethods.tex

\section{Material and Methods}\label{sec:material-and-methods}

\subsection{Material}\label{subsec:material}

\subsubsection{Dagster}
Dagster is an open-source data orchestrator for machine learning, analytics, and ETL workflows.
It lets you define pipelines in terms of the data flow between reusable, logical components.
With Dagster scalable and reliable data workflows can be built.

The most important building blocks in Dagster are Assets, Jobs and Ops.
Assets are objects in persistent storage which contain a description as code how to update this object.
Whenever persistent storage is required, eg. storing a model, storing metadata, configurations a asset should be used.
Assets can be combined to an asset graph to model dependencies of the data flow.
Jobs are the main triggers of a pipeline and can be triggered by the Web UI, fix schedules or changes of a sensor.
To perform real tasks in code a Asset consists of an graph of Ops.
An Op is a function that performs a task and can be used to split the code into reusable components.

Dagster has a well-built web interface to monitor jobs and pipelines.

\subsubsection{Label-Studio}
\subsubsection{Pytorch}
\subsubsection{NVTec}


\subsubsection{Imagenet}
\subsubsection{Muffin vs chihuahua}
Muffin vs chihuahua is a free dataset available on Kaggle.
It consists of $\sim$1500 images of muffins and chihuahuas.


\subsection{Methods}\label{subsec:methods}

\subsubsection{Active-Learning}
\subsubsection{Semi-Supervised learning}
In traditional supervised learning we have a labeled dataset.
Each datapoint is associated with a corresponding target label.
The goal is to fit a model to predict the labels from datapoints.

In traditional unsupervised learning there are also datapoints but no labels are known.
The goal is to find patterns or structures in the data.
Moreover, it can be used for clustering or downprojection.

Those two techniques combined yield semi-supervised learning.
Some of the labels are known, but for most of the data we have only the raw datapoints.
The basic idea is that the unlabeled data can significantly improve the model performance when used in combination with the labeled data.\cite{Xu_2022_CVPR}

\subsubsection{ROC and AUC}

A receiver operating characteristic curve can be used to measure the performance of a classifier of a binary classification task.
When using the accuracy as the performance metric it doesn't reveal much about the balance of the predictions.
There might be many true-positives and rarely any true-negatives and the accuracy is still good.
The ROC curve helps with this problem and visualizes the true-positives and false-positives on a line plot.
The more the curve ascents the upper-left or bottom-right corner the better the classifier gets.

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{../rsc/Roc_curve.svg}
    \caption{Architecture convolutional neural network. Image by \href{https://cointelegraph.com/explained/what-are-convolutional-neural-networks}{SKY ENGINE AI}}
    \label{fig:roc-example}
\end{figure}

Furthermore, the area under this curve is called AUR curve and a useful metric to measure the performance of a binary classifier.

\subsubsection{RESNet}
\subsubsection{CNN}
Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.
A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.
Convolutional layers are a set of learnable kernels (filters).
Each filter performs a convolution operation by sliding a window over every pixel of the image.
On each pixel a dot product creates a feature map.
Convolutional layers capture features like edges, textures or shapes.
Pooling layers sample down the feature maps created by the convolutional layers.
This helps reducing the computational complexity of the overall network and help with overfitting.
Common pooling layers include average- and max pooling.
Finally, after some convolution layers the feature map is flattened and passed to a network of fully connected layers to perform a classification or regression task.
\ref{fig:cnn-architecture} shows a typical binary classification task.

\begin{figure}
    \centering
    \includegraphics[width=\linewidth]{../rsc/cnn_architecture}
    \caption{Architecture convolutional neural network. Image by \href{https://cointelegraph.com/explained/what-are-convolutional-neural-networks}{SKY ENGINE AI}}
    \label{fig:cnn-architecture}
\end{figure}

\subsubsection{Softmax}

The Softmax function converts $n$ numbers of a vector into a probability distribution.
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
\begin{equation}\label{eq:softmax}
\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \; for j\coloneqq\{1,\dots,K\}
\end{equation}

The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19$^{\textrm{th}}$ century~\cite{Boltzmann}.
\subsubsection{Cross Entropy Loss}
Cross Entropy Loss is a well established loss function in machine learning.
\eqref{eq:crelformal} shows the formal general definition of the Cross Entropy Loss.
And~\eqref{eq:crelbinary} is the special case of the general Cross Entropy Loss for binary classification tasks.

\begin{align}
    H(p,q) &= -\sum_{x\in\mathcal{X}} p(x)\, \log q(x)\label{eq:crelformal}\\
    H(p,q) &= - (p \log q + (1-p) \log(1-q))\label{eq:crelbinary}\\
    \mathcal{L}(p,q) &= - \frac1N \sum_{i=1}^{\mathcal{B}} (p_i \log q_i + (1-p_i) \log(1-q_i))\label{eq:crelbinarybatch}
\end{align}

$\mathcal{L}(p,q)$~\eqref{eq:crelbinarybatch} is the Binary Cross Entropy Loss for a batch of size $\mathcal{B}$ and used for model training in this PW.

\subsubsection{Adam}
outsource text in seperate files 2024-04-10 19:21:26 +02:00			`\section{Material and Methods}\label{sec:material-and-methods}`

			`\subsection{Material}\label{subsec:material}`

			`\subsubsection{Dagster}`
add dagster stuff 2024-04-27 09:53:59 +02:00			`Dagster is an open-source data orchestrator for machine learning, analytics, and ETL workflows.`
add some impl stuff 2024-04-26 07:07:44 +02:00			`It lets you define pipelines in terms of the data flow between reusable, logical components.`
add dagster stuff 2024-04-27 09:53:59 +02:00			`With Dagster scalable and reliable data workflows can be built.`

			`The most important building blocks in Dagster are Assets, Jobs and Ops.`
			`Assets are objects in persistent storage which contain a description as code how to update this object.`
			`Whenever persistent storage is required, eg. storing a model, storing metadata, configurations a asset should be used.`
			`Assets can be combined to an asset graph to model dependencies of the data flow.`
			`Jobs are the main triggers of a pipeline and can be triggered by the Web UI, fix schedules or changes of a sensor.`
			`To perform real tasks in code a Asset consists of an graph of Ops.`
			`An Op is a function that performs a task and can be used to split the code into reusable components.`

			`Dagster has a well-built web interface to monitor jobs and pipelines.`
add some impl stuff 2024-04-26 07:07:44 +02:00
outsource text in seperate files 2024-04-10 19:21:26 +02:00			`\subsubsection{Label-Studio}`
			`\subsubsection{Pytorch}`
			`\subsubsection{NVTec}`
add dagster stuff 2024-04-27 09:53:59 +02:00


outsource text in seperate files 2024-04-10 19:21:26 +02:00			`\subsubsection{Imagenet}`
add dagster stuff 2024-04-27 09:53:59 +02:00			`\subsubsection{Muffin vs chihuahua}`
			`Muffin vs chihuahua is a free dataset available on Kaggle.`
			`It consists of $\sim$1500 images of muffins and chihuahuas.`

outsource text in seperate files 2024-04-10 19:21:26 +02:00
			`\subsection{Methods}\label{subsec:methods}`

			`\subsubsection{Active-Learning}`
add some math formulation of label set selection 2024-04-12 15:48:57 +02:00			`\subsubsection{Semi-Supervised learning}`
			`In traditional supervised learning we have a labeled dataset.`
			`Each datapoint is associated with a corresponding target label.`
			`The goal is to fit a model to predict the labels from datapoints.`

			`In traditional unsupervised learning there are also datapoints but no labels are known.`
			`The goal is to find patterns or structures in the data.`
			`Moreover, it can be used for clustering or downprojection.`

			`Those two techniques combined yield semi-supervised learning.`
			`Some of the labels are known, but for most of the data we have only the raw datapoints.`
move most stuff to outline section add cross entropy loss infos add text to 4 different methods 2024-04-24 12:14:44 +02:00			`The basic idea is that the unlabeled data can significantly improve the model performance when used in combination with the labeled data.\cite{Xu_2022_CVPR}`
add some math formulation of label set selection 2024-04-12 15:48:57 +02:00
outsource text in seperate files 2024-04-10 19:21:26 +02:00			`\subsubsection{ROC and AUC}`
add roc infos 2024-04-18 22:54:59 +02:00
			`A receiver operating characteristic curve can be used to measure the performance of a classifier of a binary classification task.`
			`When using the accuracy as the performance metric it doesn't reveal much about the balance of the predictions.`
			`There might be many true-positives and rarely any true-negatives and the accuracy is still good.`
			`The ROC curve helps with this problem and visualizes the true-positives and false-positives on a line plot.`
			`The more the curve ascents the upper-left or bottom-right corner the better the classifier gets.`

			`\begin{figure}`
			`\centering`
			`\includegraphics[width=\linewidth]{../rsc/Roc_curve.svg}`
			`\caption{Architecture convolutional neural network. Image by \href{https://cointelegraph.com/explained/what-are-convolutional-neural-networks}{SKY ENGINE AI}}`
			`\label{fig:roc-example}`
			`\end{figure}`

			`Furthermore, the area under this curve is called AUR curve and a useful metric to measure the performance of a binary classifier.`

outsource text in seperate files 2024-04-10 19:21:26 +02:00			`\subsubsection{RESNet}`
			`\subsubsection{CNN}`
add cnn basic infos 2024-04-12 13:19:41 +02:00			`Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.`
			`A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.`
			`Convolutional layers are a set of learnable kernels (filters).`
			`Each filter performs a convolution operation by sliding a window over every pixel of the image.`
			`On each pixel a dot product creates a feature map.`
			`Convolutional layers capture features like edges, textures or shapes.`
			`Pooling layers sample down the feature maps created by the convolutional layers.`
			`This helps reducing the computational complexity of the overall network and help with overfitting.`
			`Common pooling layers include average- and max pooling.`
			`Finally, after some convolution layers the feature map is flattened and passed to a network of fully connected layers to perform a classification or regression task.`
add some math formulation of label set selection 2024-04-12 15:48:57 +02:00			`\ref{fig:cnn-architecture} shows a typical binary classification task.`
add cnn basic infos 2024-04-12 13:19:41 +02:00
add more imgs 2024-04-17 16:04:02 +02:00			`\begin{figure}`
add cnn basic infos 2024-04-12 13:19:41 +02:00			`\centering`
			`\includegraphics[width=\linewidth]{../rsc/cnn_architecture}`
			`\caption{Architecture convolutional neural network. Image by \href{https://cointelegraph.com/explained/what-are-convolutional-neural-networks}{SKY ENGINE AI}}`
			`\label{fig:cnn-architecture}`
			`\end{figure}`

outsource text in seperate files 2024-04-10 19:21:26 +02:00			`\subsubsection{Softmax}`

fix typo 2024-04-11 12:54:47 +02:00			`The Softmax function converts $n$ numbers of a vector into a probability distribution.`
outsource text in seperate files 2024-04-10 19:21:26 +02:00			`Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.`
			`\begin{equation}\label{eq:softmax}`
add some implementation stuff 2024-04-10 23:31:41 +02:00			`\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \; for j\coloneqq\{1,\dots,K\}`
			`\end{equation}`

add cnn basic infos 2024-04-12 13:19:41 +02:00			`The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19$^{\textrm{th}}$ century~\cite{Boltzmann}.`
add some implementation stuff 2024-04-10 23:31:41 +02:00			`\subsubsection{Cross Entropy Loss}`
move most stuff to outline section add cross entropy loss infos add text to 4 different methods 2024-04-24 12:14:44 +02:00			`Cross Entropy Loss is a well established loss function in machine learning.`
			`\eqref{eq:crelformal} shows the formal general definition of the Cross Entropy Loss.`
			`And~\eqref{eq:crelbinary} is the special case of the general Cross Entropy Loss for binary classification tasks.`

			`\begin{align}`
			`H(p,q) &= -\sum_{x\in\mathcal{X}} p(x)\, \log q(x)\label{eq:crelformal}\\`
			`H(p,q) &= - (p \log q + (1-p) \log(1-q))\label{eq:crelbinary}\\`
			`\mathcal{L}(p,q) &= - \frac1N \sum_{i=1}^{\mathcal{B}} (p_i \log q_i + (1-p_i) \log(1-q_i))\label{eq:crelbinarybatch}`
			`\end{align}`

			`$\mathcal{L}(p,q)$~\eqref{eq:crelbinarybatch} is the Binary Cross Entropy Loss for a batch of size $\mathcal{B}$ and used for model training in this PW.`

add some implementation stuff 2024-04-10 23:31:41 +02:00			`\subsubsection{Adam}`