add balanced stuff

This commit is contained in:
lukas-heilgenbrunner 2024-05-16 12:36:30 +02:00
parent 841c8deb6d
commit 2ff58491b0
3 changed files with 12 additions and 1 deletions

View File

@ -92,6 +92,15 @@ Label-Studio provides a great api which can be used to update the predictions of
\subsection{Does balancing the learning samples improve performance?}\label{subsec:does-balancing-the-learning-samples-improve-performance?}
The previous process was improved by balancing the classes to give the oracle for labelling.
The idea is that it might happen that the low certainty samples might always be of one class and thus lead to an imbalanced learning process.
The sample selection was modified as described in~\ref{par:furtherimprovements}.
Unfortunately it didn't improve the convergence speed and it seems to make no difference compared to not balancing.
This might be the case because the uncertainty sampling process balances the draws itself pretty well.
% todo insert imgs
Not really.
% todo add img and add stuff

View File

@ -35,6 +35,8 @@ match predict_mode:
Moreover, the Dataset was manually imported and preprocessed with random augmentations.
\subsection{Balanced sample selection}
\subsection{Dagster with Label-Studio}\label{subsec:dagster-with-label-studio}
The main goal is to implement an active learning loop with the help of Dagster and Label-Studio.

View File

@ -260,7 +260,7 @@ So now we have defined the samples we want to label with $\mathcal{X}_t$ and the
After labelling the model $g(\pmb{x};\pmb{w})$ is trained with the new samples and the weights $\pmb{w}$ are updated with the labeled samples $\mathcal{X}_t$.
The loop starts again with the new model and draws new unlabeled samples from $\mathcal{X}_U$ as in~\eqref{eq:batchdef}.
\paragraph{Further improvement by class balancing}
\paragraph{Further improvement by class balancing} \label{par:furtherimprovements}
An intuitive improvement step might be the balancing of the class predictions.
The selected samples of the active learning step above from $\mathcal{X}_t$ might all be from one class.
This is bad for the learning process because the model might overfit to one class if always the same class is selected.