add balanced stuff

2024-05-16 12:36:30 +02:00
parent 841c8deb6d
commit 2ff58491b0
3 changed files with 12 additions and 1 deletions
--- a/src/experimentalresults.tex
+++ b/src/experimentalresults.tex
@@ -92,6 +92,15 @@ Label-Studio provides a great api which can be used to update the predictions of

 \subsection{Does balancing the learning samples improve performance?}\label{subsec:does-balancing-the-learning-samples-improve-performance?}

+The previous process was improved by balancing the classes to give the oracle for labelling.
+The idea is that it might happen that the low certainty samples might always be of one class and thus lead to an imbalanced learning process.
+The sample selection was modified as described in~\ref{par:furtherimprovements}.
+
+Unfortunately it didn't improve the convergence speed and it seems to make no difference compared to not balancing.
+This might be the case because the uncertainty sampling process balances the draws itself pretty well.
+
+% todo insert imgs
+
 Not really.

 % todo add img and add stuff
--- a/src/implementation.tex
+++ b/src/implementation.tex
@@ -35,6 +35,8 @@ match predict_mode:

 Moreover, the Dataset was manually imported and preprocessed with random augmentations.

+\subsection{Balanced sample selection}
+
 \subsection{Dagster with Label-Studio}\label{subsec:dagster-with-label-studio}

 The main goal is to implement an active learning loop with the help of Dagster and Label-Studio.
--- a/src/materialandmethods.tex
+++ b/src/materialandmethods.tex
@@ -260,7 +260,7 @@ So now we have defined the samples we want to label with $\mathcal{X}_t$ and the
 After labelling the model $g(\pmb{x};\pmb{w})$ is trained with the new samples and the weights $\pmb{w}$ are updated with the labeled samples $\mathcal{X}_t$.
 The loop starts again with the new model and draws new unlabeled samples from $\mathcal{X}_U$ as in~\eqref{eq:batchdef}.

-\paragraph{Further improvement by class balancing}
+\paragraph{Further improvement by class balancing} \label{par:furtherimprovements}
 An intuitive improvement step might be the balancing of the class predictions.
 The selected samples of the active learning step above from $\mathcal{X}_t$ might all be from one class.
 This is bad for the learning process because the model might overfit to one class if always the same class is selected.