add stuff about semi-supervised learning and fixmatch

This commit is contained in:
lukas-heiligenbrunner 2023-05-19 17:11:47 +02:00
parent 0acd8ff84a
commit 1690b3740b

View File

@ -1,6 +1,7 @@
\documentclass[sigconf]{acmart} \documentclass[sigconf]{acmart}
\usepackage{amsmath} \usepackage{amsmath}
\usepackage{bbm} \usepackage{bbm}
\usepackage{mathtools}
%% %%
%% \BibTeX command to typeset BibTeX logo in the docs %% \BibTeX command to typeset BibTeX logo in the docs
@ -69,7 +70,7 @@
%% article. %% article.
\begin{abstract} \begin{abstract}
Cross-Model Pseudo-Labeling is a new Framework for generating Pseudo-labels Cross-Model Pseudo-Labeling is a new Framework for generating Pseudo-labels
for supervised leanring tasks where only a subset of true labels is known. for supervised learning tasks where only a subset of true labels is known.
It builds upon the existing approach of FixMatch and improves it further by It builds upon the existing approach of FixMatch and improves it further by
using two different sized models complementing each other. using two different sized models complementing each other.
\end{abstract} \end{abstract}
@ -90,17 +91,54 @@
\section{Introduction} \section{Introduction}
For most supervised learning tasks are lots of training samples essential. For most supervised learning tasks are lots of training samples essential.
with too less training data the model will gerneralize not well and not fit a real world task. With too less training data the model will gerneralize not well and not fit a real world task.
Labeling datasets is in commonly seen as an expensive task and wants to be avoided as much as possible. Labeling datasets is commonly seen as an expensive task and wants to be avoided as much as possible.
Thats why there is a machine-learning field called Semi-Supervised learning. Thats why there is a machine-learning field called Semi-Supervised learning.
The general approach is to train a model that predicts Pseudo-Labels which then can be used to train the main model. The general approach is to train a model that predicts Pseudo-Labels which then can be used to train the main model.
\section{Semi-Supervised learning} \section{Semi-Supervised learning}
todo write stuff In traditional supervised learning we have a labeled dataset.
Each datapoint is associated with a corresponding target label.
The goal is to fit a model to predict the labels from datapoints.
In traditional unsupervised learning no labels are known.
The goal is to find patterns and structures in the data.
Those two techniques combined yield semi-supervised learning.
Some of the labels are known, but for most of the data we have only the raw datapoints.
The basic idea is that the unlabeled data can significantly improve the model performance when used in combination with the labeled data.
\section{FixMatch}\label{sec:fixmatch} \section{FixMatch}\label{sec:fixmatch}
There exists an already existing approach called FixMatch. There exists an already existing approach called FixMatch.
This was introduced in a Google Research paper from 2020~\cite{fixmatch}. This was introduced in a Google Research paper from 2020~\cite{fixmatch}.
The key idea of FixMatch is to leverage the unlabeled data by predicting pseudo-labels out of the known labels.
Then both, the known labels and the predicted ones are used side by side to train the model.
The labeled samples guide the learning process and the unlabeled samples gain additional information.
Not every pseudo prediction is kept to train the model further.
A confidence threshold is defined to evaluate how `confident` the model is of its prediction.
The prediction is dropped if the model is too less confident.
The quantity and quality of the obtained labels is crucial and they have an significant impact on the overall accuracy.
This means improving the pseudo-label framework as much as possible is important.
\subsection{Math of FixMatch}\label{subsec:math-of-fixmatch}
$\mathcal{L}_u$ defines the loss-function that trains the model.
The sum over a batch size $B_u$ takes the average loss of this batch and should be straight forward.
The input data is augmented in two different ways.
At first there is a weak augmentation $\mathcal{T}_{\text{weak}}(\cdot)$ which only applies basic transformation such as filtering and bluring.
Moreover, there is the strong augmentation $\mathcal{T}_{\text{strong}}(\cdot)$ which does cropouts and edge-detections.
The interesting part is the indicator function $\mathbbm{1}(\cdot)$ which applies a principle called `confidence-based masking`.
It retains a label only if its largest probability is above a threshold $\tau$.
Where $p_i \coloneqq F(\mathcal{T}_{\text{weak}}(u_i))$ is a model evaluation with a weakly augmented input.
The second part $\mathcal{H}(\cdot, \cdot)$ is a standard Cross-entropy loss function which takes two inputs.
$\hat{y}_i$, the obtained pseudo-label and $F(\mathcal{T}_{\text{strong}}(u_i))$, a model evaluation with strong augmentation.
The indicator function evaluates in $0$ if the pseudo prediction is not confident and the current loss evaluation will be dropped.
Otherwise it will be kept and trains the model further.
\begin{equation}
\label{eq:equation2}
\mathcal{L}_u = \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i) \geq \tau) \mathcal{H}(\hat{y}_i,F(\mathcal{T}_{\text{strong}}(u_i)))
\end{equation}
\section{Cross-Model Pseudo-Labeling} \section{Cross-Model Pseudo-Labeling}
todo write stuff \cite{Xu_2022_CVPR} todo write stuff \cite{Xu_2022_CVPR}
@ -111,7 +149,7 @@ todo write stuff \cite{Xu_2022_CVPR}
\mathcal{L}_u = \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i) \geq \tau) \mathcal{H}(\hat{y}_i,F(\mathcal{T}_{\text{strong}}(u_i))) \mathcal{L}_u = \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i) \geq \tau) \mathcal{H}(\hat{y}_i,F(\mathcal{T}_{\text{strong}}(u_i)))
\end{equation} \end{equation}
\section{Figures} \section{Performance}
\begin{figure}[h] \begin{figure}[h]
\centering \centering