add text for balancing classes

This commit is contained in:
lukas-heiligenbrunner 2024-04-24 15:27:49 +02:00
parent ba105985b3
commit 89870b040b
2 changed files with 16 additions and 1 deletions

View File

@ -106,3 +106,18 @@ This is expected to perform the worst but might still be better than random samp
So now we have defined the samples we want to label with $\mathcal{X}_t$ and the user starts labeling this samples.
After labelling the model $g(\pmb{x};\pmb{w})$ is trained with the new samples and the weights $\pmb{w}$ are updated with the labeled samples $\mathcal{X}_t$.
The loop starts again with the new model and draws new unlabeled samples from $\mathcal{X}_U$.
\subsubsection{Further improvement by class balancing}
An intuitive improvement step might be the balancing of the class predictions.
The selected samples of the active learning step above from $\mathcal{X}_t$ might all be from one class.
This is bad for the learning process because the model overfits to one class if always the same class is selected.
Since nobody knows the true label during the sample selection process we cannot just sort by the true label and balance the samples.
The simplest solution to this is using the models predicted class and balance the selection by using half of the samples from one predicted class and the other one from the other.
Afterwards apply the selected scoring metric from above to do uncertainty sampling or similar to the balanced selection.
This process can be shown mathematically for low certainty sampling as in~\eqref{eq:balancedlowcertainty}.
\begin{equation}\label{eq:balancedlowcertainty}
\mathcal{X}_t = \text{min}_{\mathcal{B}/2}(\left\{\alpha \in S(z) : \alpha_0 < 0.5\right\}) \cup \text{min}_{\mathcal{B}/2}(\left\{\alpha \in S(z) : \alpha_1 < 0.5\right\})
\end{equation}

View File

@ -1,4 +1,4 @@
\def\ieee{1}
\def\ieee{0}
\if\ieee1
\documentclass[sigconf]{acmart}