\section{Experimental Results}\label{sec:experimental-results}

\subsection{Does Active-Learning benefit the learning process?}\label{subsec:does-active-learning-benefit-the-learning-process?}

With the test setup described in~\ref{sec:implementation} a test series was performed.
Several different batch sizes $\mathcal{B} = \left\{ 2,4,6,8 \right\}$ and sample sizes $\mathcal{S} = \left\{ 2\mathcal{B}_i,4\mathcal{B}_i,5\mathcal{B}_i,10\mathcal{B}_i \right\}$
dependent on the selected batch size were selected.
We define the baseline (passive learning) AUC curve as the supervised learning process without any active learning.
The following graphs are only a subselection of the test series which give the most insights.

\begin{figure}
    \centering
    \hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_2_10}
    \caption{AUC with $\mathcal{B} = 2$ and $\mathcal{S}=10$}
    \label{fig:auc_normal_lowcer_2_10}
\end{figure}

\begin{figure}
    \centering
    \hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_2_20}
    \caption{AUC with $\mathcal{B} = 2$ and $\mathcal{S}=20$}
    \label{fig:auc_normal_lowcer_2_20}
\end{figure}

\begin{figure}
    \centering
    \hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_2_50}
    \caption{AUC with $\mathcal{B} = 2$ and $\mathcal{S}=50$}
    \label{fig:auc_normal_lowcer_2_50}
\end{figure}

\begin{figure}
    \centering
    \hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_4_16}
    \caption{AUC with $\mathcal{B} = 4$ and $\mathcal{S}=16$}
    \label{fig:auc_normal_lowcer_4_16}
\end{figure}

\begin{figure}
    \centering
    \hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_4_24}
    \caption{AUC with $\mathcal{B} = 4$ and $\mathcal{S}=24$}
    \label{fig:auc_normal_lowcer_4_24}
\end{figure}

\begin{figure}
    \centering
    \hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_8_16}
    \caption{AUC with $\mathcal{B} = 8$ and $\mathcal{S}=16$}
    \label{fig:auc_normal_lowcer_8_16}
\end{figure}

\begin{figure}
    \centering
    \hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_8_32}
    \caption{AUC with $\mathcal{B} = 8$ and $\mathcal{S}=32$}
    \label{fig:auc_normal_lowcer_8_32}
\end{figure}

Generally a pattern can be seen: The lower the batch size $\mathcal{B}$ the more benefits are gained by active learning.
This may be caused by the fast model convergence.
The lower $\mathcal{B}$ the more pre-prediction decision points are required.
This helps directing the learning with better samples of the selected metric.
When the batch size is higher the model already converges to a good AUC value before the same amount of pre-predictions is reached.

Moreover, when increasing the sample-space $\mathcal{S}$ from where the pre-predictions are drawn generally the performance improves.
This is because the selected subset $\pmb{x} \sim \mathcal{X}_U$ has a higher chance of containing relevant elements corresponding to the selected metric.
But keep in mind this improvement comes with a performance penalty because more model evaluations are required to predict the ranking scores.

\ref{fig:auc_normal_lowcer_2_10};\ref{fig:auc_normal_lowcer_2_20};\ref{fig:auc_normal_lowcer_2_50} shows the AUC curve with a batch size of 2 and a sample size of 10, 20, 50 respectively.
On all three graphs the active learning curve outperforms the passive learning curve in all four scenarios.
Generally the higher the sample space $\mathcal{S}$ the better the performance.

\ref{fig:auc_normal_lowcer_4_16};\ref{fig:auc_normal_lowcer_4_24} shows the AUC curve with a batch size of 4 and a sample size of 16, 24 respectively.
The performance is already much worse compared to the results from above with a batch size of 2.
Only the low certainty first approach outperforms the passive learning in both cases.
The other methods are as good or worse than the passive learning curve.

\ref{fig:auc_normal_lowcer_8_16};\ref{fig:auc_normal_lowcer_8_32} shows the AUC curve with a batch size of 8 and a sample size of 16, 32 respectively.
The performance is even worse compared to the results from above with a batch size of 4.
This might be the case because the model already converges to a good AUC value before the same amount of pre-predictions is reached.

\subsection{Is Dagster and Label-Studio a proper tooling to build an AL
Loop?}\label{subsec:is-dagster-and-label-studio-a-proper-tooling-to-build-an-al
loop?}

The combination of Dagster and Label-Studio is a good choice for building an active-learning loop.
Dagster provides a clean way to build pipelines and to keep track of the data in the Web UI\@.
Label-Studio provides a great api which can be used to update the predictions of the model from the dagster pipeline.

% todo write stuff here

\subsection{Does balancing the learning samples improve performance?}\label{subsec:does-balancing-the-learning-samples-improve-performance?}

Not really.

% todo add img and add stuff