add remaining loss formulas
fix some typos
This commit is contained in:
parent
8d76a7fef4
commit
c0c51f8ecf
@ -89,14 +89,14 @@
|
||||
%% information and builds the first part of the formatted document.
|
||||
\maketitle
|
||||
|
||||
\section{Introduction}
|
||||
\section{Introduction}\label{sec:introduction}
|
||||
For most supervised learning tasks are lots of training samples essential.
|
||||
With too less training data the model will gerneralize not well and not fit a real world task.
|
||||
Labeling datasets is commonly seen as an expensive task and wants to be avoided as much as possible.
|
||||
Thats why there is a machine-learning field called Semi-Supervised learning.
|
||||
The general approach is to train a model that predicts Pseudo-Labels which then can be used to train the main model.
|
||||
|
||||
\section{Semi-Supervised learning}
|
||||
\section{Semi-Supervised learning}\label{sec:semi-supervised-learning}
|
||||
In traditional supervised learning we have a labeled dataset.
|
||||
Each datapoint is associated with a corresponding target label.
|
||||
The goal is to fit a model to predict the labels from datapoints.
|
||||
@ -122,7 +122,7 @@ The quantity and quality of the obtained labels is crucial and they have an sign
|
||||
This means improving the pseudo-label framework as much as possible is important.
|
||||
|
||||
\subsection{Math of FixMatch}\label{subsec:math-of-fixmatch}
|
||||
The equation~\ref{eq:fixmatch} defines the loss-function that trains the model.
|
||||
Equation~\ref{eq:fixmatch} defines the loss-function that trains the model.
|
||||
The sum over a batch size $B_u$ takes the average loss of this batch and should be straight forward.
|
||||
The input data is augmented in two different ways.
|
||||
At first there is a weak augmentation $\mathcal{T}_{\text{weak}}(\cdot)$ which only applies basic transformation such as filtering and bluring.
|
||||
@ -136,14 +136,15 @@ Moreover, there is the strong augmentation $\mathcal{T}_{\text{strong}}(\cdot)$
|
||||
The interesting part is the indicator function $\mathbbm{1}(\cdot)$ which applies a principle called `confidence-based masking`.
|
||||
It retains a label only if its largest probability is above a threshold $\tau$.
|
||||
Where $p_i \coloneqq F(\mathcal{T}_{\text{weak}}(u_i))$ is a model evaluation with a weakly augmented input.
|
||||
The second part $\mathcal{H}(\cdot, \cdot)$ is a standard Cross-entropy loss function which takes two inputs.
|
||||
The second part $\mathcal{H}(\cdot, \cdot)$ is a standard Cross-entropy loss function which takes two inputs, the predicted and the true label.
|
||||
$\hat{y}_i$, the obtained pseudo-label and $F(\mathcal{T}_{\text{strong}}(u_i))$, a model evaluation with strong augmentation.
|
||||
The indicator function evaluates in $0$ if the pseudo prediction is not confident and the current loss evaluation will be dropped.
|
||||
Otherwise it will be kept and trains the model further.
|
||||
|
||||
\section{Cross-Model Pseudo-Labeling}
|
||||
The newly invented approach of this paper is called Cross-Model Pseudo-Labeling (CMPL).\cite{Xu_2022_CVPR}
|
||||
\section{Cross-Model Pseudo-Labeling}\label{sec:cross-model-pseudo-labeling}
|
||||
The newly invented approach of this paper is called Cross-Model Pseudo-Labeling (CMPL)\cite{Xu_2022_CVPR}.
|
||||
In Figure~\ref{fig:cmpl-structure} one can see its structure.
|
||||
We define two different models, a smaller and a larger one.
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
@ -153,12 +154,41 @@ In Figure~\ref{fig:cmpl-structure} one can see its structure.
|
||||
\end{figure}
|
||||
|
||||
\subsection{Math of CMPL}\label{subsec:math}
|
||||
The loss function of CMPL is similar to that one explaind above.
|
||||
But we have to differ from the loss generated from the supervised samples with the label known and the unsupervised loss where no labels are knonw.
|
||||
|
||||
The two equations~\ref{eq:cmpl-losses1} and~\ref{eq:cmpl-losses2} are normal Cross-Entropy loss functions generated with the supervised labels of the two seperate models.
|
||||
|
||||
|
||||
\begin{align}
|
||||
\label{eq:cmpl-losses1}
|
||||
\mathcal{L}_s^F &= \frac{1}{B_l} \sum_{i=1}^{B_l} \mathcal{H}(y_i,F(\mathcal{T}^F_{\text{standard}}(v_i)))\\
|
||||
\label{eq:cmpl-losses2}
|
||||
\mathcal{L}_s^A &= \frac{1}{B_l} \sum_{i=1}^{B_l} \mathcal{H}(y_i,A(\mathcal{T}^F_{\text{standard}}(v_i)))
|
||||
\end{align}
|
||||
|
||||
Equation~\ref{eq:cmpl-loss3} and~\ref{eq:cmpl-loss4} are the unsupervised losses.
|
||||
They are very similar to FastMatch, but
|
||||
|
||||
\begin{align}
|
||||
\label{eq:cmpl-loss3}
|
||||
\mathcal{L}_u^F &= \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i^A) \geq \tau) \mathcal{H}(\hat{y}_i^A,F(\mathcal{T}_{\text{strong}}(u_i)))\\
|
||||
\label{eq:cmpl-loss4}
|
||||
\mathcal{L}_u^A &= \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i^F) \geq \tau) \mathcal{H}(\hat{y}_i^F,A(\mathcal{T}_{\text{strong}}(u_i)))
|
||||
\end{align}
|
||||
|
||||
Finally to train the main objective an overall loss is calculated by simply summing all the losses.
|
||||
The loss is regulated by an hyperparamter $\lambda$ to enhance the importance of the supervised loss.
|
||||
|
||||
\begin{equation}
|
||||
\label{eq:equation}
|
||||
\mathcal{L}_u = \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i) \geq \tau) \mathcal{H}(\hat{y}_i,F(\mathcal{T}_{\text{strong}}(u_i)))
|
||||
\label{eq:loss-main-obj}
|
||||
\mathcal{L} = (\mathcal{L}_s^F + \mathcal{L}_s^A) + \lambda(\mathcal{L}_u^F + \mathcal{L}_u^A)
|
||||
\end{equation}
|
||||
|
||||
\section{Performance}
|
||||
\section{Performance}\label{sec:performance}
|
||||
|
||||
In figure~\ref{fig:results} a performance comparison is shown between just using the supervised samples for training against some different pseudo label frameworks.
|
||||
One can clearly see that the performance gain with the new CMPL framework is quite significant.
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
@ -178,35 +208,7 @@ In Figure~\ref{fig:cmpl-structure} one can see its structure.
|
||||
%% If your work has an appendix, this is the place to put it.
|
||||
\appendix
|
||||
|
||||
\section{Research Methods}
|
||||
|
||||
\subsection{Part One}
|
||||
|
||||
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi
|
||||
malesuada, quam in pulvinar varius, metus nunc fermentum urna, id
|
||||
sollicitudin purus odio sit amet enim. Aliquam ullamcorper eu ipsum
|
||||
vel mollis. Curabitur quis dictum nisl. Phasellus vel semper risus, et
|
||||
lacinia dolor. Integer ultricies commodo sem nec semper.
|
||||
|
||||
\subsection{Part Two}
|
||||
|
||||
Etiam commodo feugiat nisl pulvinar pellentesque. Etiam auctor sodales
|
||||
ligula, non varius nibh pulvinar semper. Suspendisse nec lectus non
|
||||
ipsum convallis congue hendrerit vitae sapien. Donec at laoreet
|
||||
eros. Vivamus non purus placerat, scelerisque diam eu, cursus
|
||||
ante. Etiam aliquam tortor auctor efficitur mattis.
|
||||
|
||||
\section{Online Resources}
|
||||
|
||||
Nam id fermentum dui. Suspendisse sagittis tortor a nulla mollis, in
|
||||
pulvinar ex pretium. Sed interdum orci quis metus euismod, et sagittis
|
||||
enim maximus. Vestibulum gravida massa ut felis suscipit
|
||||
congue. Quisque mattis elit a risus ultrices commodo venenatis eget
|
||||
dui. Etiam sagittis eleifend elementum.
|
||||
|
||||
Nam interdum magna at lectus dignissim, ac dignissim lorem
|
||||
rhoncus. Maecenas eu arcu ac neque placerat aliquam. Nunc pulvinar
|
||||
massa et mattis lacinia.
|
||||
% appendix
|
||||
|
||||
\end{document}
|
||||
\endinput
|
||||
|
Loading…
Reference in New Issue
Block a user