add imgs and text to impl

This commit is contained in:
lukas-heiligenbrunner 2024-04-29 21:54:43 +02:00
parent 1133a5bcbd
commit 74ea31cf9c
13 changed files with 153 additions and 21 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 82 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

BIN
rsc/dagster/assets.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 79 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 186 KiB

BIN
rsc/dagster/train_model.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 297 KiB

View File

@ -2,6 +2,12 @@
\subsection{Conclusion}\label{subsec:conclusion}
Active learning can hugely benefit the learning process when applied correctly.
The lower the batch size $\mathcal{B}$ the more improvement one can expect.
The higher the sampling space $\mathcal{S}$ the higher the gains but the more performance is required.
\subsection{Outlook}\label{subsec:outlook}
Results might be different with a multiclass classification task and segmentation tasks.
A good point to take over from here is to implement the active learning loop in a real-world scenario where the model is more complex.
Moreover, testing active learning performance within segmentation tasks might be interesting.

View File

@ -2,32 +2,81 @@
\subsection{Does Active-Learning benefit the learning process?}
With the test setup described in~\ref{sec:implementation} a test series was performed.
Several different batch sizes $\mathcal{B} = \left\{ 2,4,6,8 \right\}$ and sample sizes $\mathcal{S} = \left\{ 2\mathcal{B}_i,4\mathcal{B}_i,5\mathcal{B}_i,10\mathcal{B}_i \right\}$
dependent on the selected batch size were selected.
We define the baseline (passive learning) AUC curve as the supervised learning process without any active learning.
The following graphs are only a subselection of the test series which give the most insights.
\begin{figure}
\label{fig:auc_normal_lowcer_2_10}
\centering
\includegraphics[width=\linewidth]{../rsc/AUC_normal_lowcer_2_50}
\caption{Architecture convolutional neural network}
\hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_2_10}
\caption{AUC with $\mathcal{B} = 2$ and $\mathcal{S}=10$}
\end{figure}
\begin{figure}
\label{fig:auc_normal_lowcer_2_20}
\centering
\includegraphics[width=\linewidth]{../rsc/AUC_normal_lowcer_2_20}
\caption{Architecture convolutional neural network}
\hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_2_20}
\caption{AUC with $\mathcal{B} = 2$ and $\mathcal{S}=20$}
\end{figure}
\begin{figure}
\label{fig:auc_normal_lowcer_2_50}
\centering
\includegraphics[width=\linewidth]{../rsc/AUC_normal_lowcer_2_10}
\caption{Architecture convolutional neural network}
\hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_2_50}
\caption{AUC with $\mathcal{B} = 2$ and $\mathcal{S}=50$}
\end{figure}
\begin{figure}
\label{fig:auc_normal_lowcer_4_16}
\centering
\includegraphics[width=\linewidth]{../rsc/AUC_normal_lowcer_5_50}
\caption{Architecture convolutional neural network3}
\hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_4_16}
\caption{AUC with $\mathcal{B} = 4$ and $\mathcal{S}=16$}
\end{figure}
\begin{figure}
\label{fig:auc_normal_lowcer_4_24}
\centering
\hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_4_24}
\caption{AUC with $\mathcal{B} = 4$ and $\mathcal{S}=24$}
\end{figure}
\begin{figure}
\label{fig:auc_normal_lowcer_8_16}
\centering
\hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_8_16}
\caption{AUC with $\mathcal{B} = 8$ and $\mathcal{S}=16$}
\end{figure}
\begin{figure}
\label{fig:auc_normal_lowcer_8_32}
\centering
\hspace*{-0.1\linewidth}\includegraphics[width=1.2\linewidth]{../rsc/AUC_normal_lowcer_8_32}
\caption{AUC with $\mathcal{B} = 8$ and $\mathcal{S}=32$}
\end{figure}
Generally a pattern can be seen: The lower the batch size the more benefits are gained by active learning.
This may be caused by the fast model convergence.
The lower the batch size the more pre-prediction decision points are required.
This helps directing the learning with better samples of the selected metric.
When the batch size is higher the model already converges to a good AUC value before the same amount of pre-predictions is reached.
Moreover, when increasing the sample-space $\mathcal{S}$ from where the pre-predictions are drawn generally the performance improves.
This is because the selected subset $\pmb{x} \sim \mathcal{X}_U$ has a higher chance of containing relevant elements corresponding to the selected metric.
But keep in mind this improvement comes with a performance penalty because more model evaluations are required to predict the ranking scores.
% todo
\ref{fig:auc_normal_lowcer_2_10} shows the AUC curve with a batch size of 2 and a sample size of 10.
Todo add some references to the graphs.
\subsection{Is Dagster and Label-Studio a proper tooling to build an AL
Loop?}
Loop?}\label{subsec:is-dagster-and-label-studio-a-proper-tooling-to-build-an-al
loop?}
\subsection{Does balancing the learning samples improve performance?}
The combination of Dagster and Label-Studio is a good choice for building an active-learning loop.
\subsection{Does balancing the learning samples improve performance?}\label{subsec:does-balancing-the-learning-samples-improve-performance?}
Not really.

View File

@ -1,18 +1,60 @@
\section{Implementation}\label{sec:implementation}
\subsection{Dagster with Label-Studio}
\subsection{Dagster with Label-Studio}\label{subsec:dagster-with-label-studio}
The main goal is to implement an active learning loop with the help of Dagster and Label-Studio.
The task was split as much as possible...
The active learning loop was split as much as possible into assets and graph assets.
This helps building reusable building blocks and to keep the code clean.
\subsection{Jupyter}
\begin{figure}
\centering
\includegraphics[width=\linewidth]{../rsc/dagster/assets}
\caption{Dagster asset graph}
\end{figure}
\begin{figure}
\centering
\subfloat[Score prediction graph asset]{
\includegraphics[width=0.45\linewidth]{../rsc/dagster/predict_scores}
}
\hfill
\subfloat[Model training graph asset]{
\includegraphics[width=0.45\linewidth]{../rsc/dagster/train_model}
}
\caption{Dagster graph assets}
\end{figure}
\subsection{Jupyter}\label{subsec:jupyter}
To get accurate performance measures the active-learning process was implemented in a Jupyter notebook first.
This helps to choose which of the methods performs the best and which one to use in the final Dagster pipeline.
A straight forward machine-learning pipeline was implemented with the help of Pytorch and RESNet.
\begin{lstlisting}
# todo listing of the sample selection process
\begin{lstlisting}[language=Python, caption=Certainty sampling process of selected metric]
df = df.sort_values(by=['score'])
# match the currently active sampling metric
match predict_mode:
case PredictMode.HIGHANDLOWCERTAINTY:
train_samples = pd.concat([df[:int(batch_size/2)], df[-int((batch_size/2)+batch_size%2):]])["sample"].values.tolist()
unlabeled_samples += df[int(batch_size/2):-int(batch_size/2)]["sample"].values.tolist()
pass
case PredictMode.LOWCERTAINTY:
train_samples = df[:batch_size]["sample"].values.tolist()
unlabeled_samples += df[batch_size:]["sample"].values.tolist()
pass
case PredictMode.HIGHCERTAINTY:
train_samples = df[-batch_size:]["sample"].values.tolist()
unlabeled_samples += df[:-batch_size]["sample"].values.tolist()
pass
case PredictMode.MIDCERTAINTY:
train_samples = df[int(sample_size/2 - batch_size/2):int(sample_size/2 + batch_size/2)]["sample"].values.tolist()
unlabeled_samples += df[:int(sample_size/2 - batch_size/2)][-int(sample_size/2 - batch_size/2):]["sample"].values.tolist()
pass
case PredictMode.NONE:
train_samples = s[:batch_size]
unlabeled_samples.extend(s[batch_size:])
pass
\end{lstlisting}
Moreover, the Dataset was manually imported and preprocessed with random augmentations.

View File

@ -105,12 +105,12 @@ This is expected to perform the worst but might still be better than random samp
\subsubsection{Model training}
So now we have defined the samples we want to label with $\mathcal{X}_t$ and the user starts labeling this samples.
After labelling the model $g(\pmb{x};\pmb{w})$ is trained with the new samples and the weights $\pmb{w}$ are updated with the labeled samples $\mathcal{X}_t$.
The loop starts again with the new model and draws new unlabeled samples from $\mathcal{X}_U$.
The loop starts again with the new model and draws new unlabeled samples from $\mathcal{X}_U$ as in~\eqref{eq:batchdef}.
\subsubsection{Further improvement by class balancing}
An intuitive improvement step might be the balancing of the class predictions.
The selected samples of the active learning step above from $\mathcal{X}_t$ might all be from one class.
This is bad for the learning process because the model overfits to one class if always the same class is selected.
This is bad for the learning process because the model might overfit to one class if always the same class is selected.
Since nobody knows the true label during the sample selection process we cannot just sort by the true label and balance the samples.
The simplest solution to this is using the models predicted class and balance the selection by using half of the samples from one predicted class and the other one from the other.

View File

@ -1,4 +1,4 @@
\def\ieee{1}
\def\ieee{0}
\if\ieee1
\documentclass[sigconf]{acmart}
@ -10,7 +10,40 @@
\usepackage{hyperref}
\usepackage{listings}
\usepackage{xcolor}
\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
\lstdefinestyle{mystyle}{
backgroundcolor=\color{backcolour},
commentstyle=\color{codegreen},
keywordstyle=\color{magenta},
numberstyle=\tiny\color{codegray},
stringstyle=\color{codepurple},
basicstyle=\ttfamily\scriptsize,
breakatwhitespace=false,
breaklines=true,
captionpos=b,
keepspaces=true,
numbers=left,
numbersep=5pt,
showspaces=false,
showstringspaces=false,
showtabs=false,
tabsize=2
}
\lstset{style=mystyle}
%\lstset{basicstyle=\ttfamily, keywordstyle=\bfseries}
\usepackage{subfig}
\usepackage[inline]{enumitem}
\usepackage{color}
\if\ieee1
@ -36,7 +69,7 @@
%%
%% The "title" command has an optional parameter,
%% allowing the author to define a "short title" to be used in page headers.
\title{Minimize labeling effort of Binary classification Tasks with Active learning}
\title{Minimize labeling effort of binary classification tasks with active learning}
%%
%% The "author" command and its associated commands are used to define
@ -55,6 +88,8 @@
\country{Austria}
\postcode{4020}
}
\else
\institute{Johannes Kepler University Linz}
\fi

View File

@ -55,7 +55,7 @@ The more the curve ascents the upper-left or bottom-right corner the better the
\begin{figure}
\centering
\includegraphics[width=\linewidth]{../rsc/Roc_curve.svg}
\includegraphics[width=\linewidth/2]{../rsc/Roc_curve.svg}
\caption{Architecture convolutional neural network. Image by \href{https://cointelegraph.com/explained/what-are-convolutional-neural-networks}{SKY ENGINE AI}}
\label{fig:roc-example}
\end{figure}