fix some typos and formulations
This commit is contained in:
parent
794ee2739f
commit
9a70537ca7
@ -76,9 +76,9 @@
|
||||
|
||||
\section{Introduction}\label{sec:introduction}
|
||||
For most supervised learning tasks are lots of training samples essential.
|
||||
With too less training data the model will gerneralize not well and not fit a real world task.
|
||||
With too less training data the model will not gerneralize well and not fit a real world task.
|
||||
Labeling datasets is commonly seen as an expensive task and wants to be avoided as much as possible.
|
||||
Thats why there is a machine-learning field called Semi-Supervised learning.
|
||||
Thats why there is a machine-learning field called semi-supervised learning.
|
||||
The general approach is to train a model that predicts Pseudo-Labels which then can be used to train the main model.
|
||||
|
||||
The goal of this paper is video action recognition.
|
||||
@ -100,7 +100,7 @@ Some of the labels are known, but for most of the data we have only the raw data
|
||||
The basic idea is that the unlabeled data can significantly improve the model performance when used in combination with the labeled data.
|
||||
|
||||
\section{FixMatch}\label{sec:fixmatch}
|
||||
There exists an already existing approach called FixMatch.
|
||||
There is an already existing approach called FixMatch.
|
||||
This was introduced in a Google Research paper from 2020~\cite{fixmatch}.
|
||||
The key idea of FixMatch is to leverage the unlabeled data by predicting pseudo-labels out of the known labels.
|
||||
Then both, the known labels and the predicted ones are used side by side to train the model.
|
||||
@ -109,7 +109,7 @@ The labeled samples guide the learning process and the unlabeled samples gain ad
|
||||
Not every pseudo prediction is kept to train the model further.
|
||||
A confidence threshold is defined to evaluate how `confident` the model is about its prediction.
|
||||
The prediction is dropped if the model is too less confident.
|
||||
The quantity and quality of the obtained labels is crucial and they have an significant impact on the overall accuracy.
|
||||
The quantity and quality of the obtained labels is crucial and they have a significant impact on the overall accuracy.
|
||||
This means improving the pseudo-label framework as much as possible is essential.
|
||||
|
||||
FixMatch results in some major limitations.
|
||||
@ -154,7 +154,7 @@ Two different models, a smaller auxiliary model and a larger model are defined.
|
||||
They provide pseudo-labels for each other.
|
||||
The two different models have a different structural bias which leads to complementary representations.
|
||||
This symetric design performs a boost in performance.
|
||||
The SG label means stop gradient.
|
||||
The SG label means \grqq Stop Gradient \grqq.
|
||||
The loss function evaluations are fed into the opposite model as loss.
|
||||
The two models train each other.
|
||||
|
||||
@ -168,7 +168,7 @@ The two models train each other.
|
||||
|
||||
\subsection{Math of CMPL}\label{subsec:math}
|
||||
The loss function of CMPL is similar to that one explaind above.
|
||||
But we have to differ from the loss generated from the supervised samples where the labels are known and the unsupervised loss where no labels are knonw.
|
||||
But we have to differ from the loss generated from the supervised samples where the labels are known and the unsupervised loss where no labels are available.
|
||||
|
||||
The two equations~\ref{eq:cmpl-losses1} and~\ref{eq:cmpl-losses2} are normal Cross-Entropy loss functions generated with the supervised labels of the two seperate models.
|
||||
|
||||
@ -190,7 +190,7 @@ They are very similar to FastMatch, but important to note is that the confidence
|
||||
\mathcal{L}_u^A &= \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i^F) \geq \tau) \mathcal{H}(\hat{y}_i^F,A(\mathcal{T}_{\text{strong}}(u_i)))
|
||||
\end{align}
|
||||
|
||||
Finally to train the main objective an overall loss is calculated by simply summing all the losses.
|
||||
Finally to train the main objective a overall loss is calculated by simply summing all the losses.
|
||||
The loss is regulated by an hyperparamter $\lambda$ to enhance the importance of the supervised loss.
|
||||
|
||||
\begin{equation}
|
||||
@ -221,7 +221,7 @@ Even when only 1\% of true labels are known for the UCF-101 dataset 25.1\% of th
|
||||
\section{Further schemes}\label{sec:further-schemes}
|
||||
How the pseudo-labels are generated may impact the overall performance.
|
||||
In this paper the pseudo-labels are obtained by the cross-model approach.
|
||||
But there might be other strategies.
|
||||
But there might be other strategies as well.
|
||||
For example:
|
||||
\begin{enumerate*}
|
||||
\item Self-First: Each network uses just its own prediction if its confident enough.
|
||||
|
Loading…
Reference in New Issue
Block a user