add stuff about semi-supervised learning and fixmatch
This commit is contained in:
		@@ -1,6 +1,7 @@
 | 
				
			|||||||
\documentclass[sigconf]{acmart}
 | 
					\documentclass[sigconf]{acmart}
 | 
				
			||||||
\usepackage{amsmath}
 | 
					\usepackage{amsmath}
 | 
				
			||||||
\usepackage{bbm}
 | 
					\usepackage{bbm}
 | 
				
			||||||
 | 
					\usepackage{mathtools}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
%%
 | 
					%%
 | 
				
			||||||
%% \BibTeX command to typeset BibTeX logo in the docs
 | 
					%% \BibTeX command to typeset BibTeX logo in the docs
 | 
				
			||||||
@@ -69,7 +70,7 @@
 | 
				
			|||||||
%% article.
 | 
					%% article.
 | 
				
			||||||
\begin{abstract}
 | 
					\begin{abstract}
 | 
				
			||||||
  Cross-Model Pseudo-Labeling is a new Framework for generating Pseudo-labels
 | 
					  Cross-Model Pseudo-Labeling is a new Framework for generating Pseudo-labels
 | 
				
			||||||
  for supervised leanring tasks where only a subset of true labels is known.
 | 
					  for supervised learning tasks where only a subset of true labels is known.
 | 
				
			||||||
  It builds upon the existing approach of FixMatch and improves it further by
 | 
					  It builds upon the existing approach of FixMatch and improves it further by
 | 
				
			||||||
  using two different sized models complementing each other.
 | 
					  using two different sized models complementing each other.
 | 
				
			||||||
\end{abstract}
 | 
					\end{abstract}
 | 
				
			||||||
@@ -90,17 +91,54 @@
 | 
				
			|||||||
 | 
					
 | 
				
			||||||
\section{Introduction}
 | 
					\section{Introduction}
 | 
				
			||||||
For most supervised learning tasks are lots of training samples essential.
 | 
					For most supervised learning tasks are lots of training samples essential.
 | 
				
			||||||
with too less training data the model will gerneralize not well and not fit a real world task.
 | 
					With too less training data the model will gerneralize not well and not fit a real world task.
 | 
				
			||||||
Labeling datasets is in commonly seen as an expensive task and wants to be avoided as much as possible.
 | 
					Labeling datasets is commonly seen as an expensive task and wants to be avoided as much as possible.
 | 
				
			||||||
Thats why there is a machine-learning field called Semi-Supervised learning.
 | 
					Thats why there is a machine-learning field called Semi-Supervised learning.
 | 
				
			||||||
The general approach is to train a model that predicts Pseudo-Labels which then can be used to train the main model.
 | 
					The general approach is to train a model that predicts Pseudo-Labels which then can be used to train the main model.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
\section{Semi-Supervised learning}
 | 
					\section{Semi-Supervised learning}
 | 
				
			||||||
todo write stuff
 | 
					In traditional supervised learning we have a labeled dataset.
 | 
				
			||||||
 | 
					Each datapoint is associated with a corresponding target label.
 | 
				
			||||||
 | 
					The goal is to fit a model to predict the labels from datapoints.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In traditional unsupervised learning no labels are known.
 | 
				
			||||||
 | 
					The goal is to find patterns and structures in the data.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Those two techniques combined yield semi-supervised learning.
 | 
				
			||||||
 | 
					Some of the labels are known, but for most of the data we have only the raw datapoints.
 | 
				
			||||||
 | 
					The basic idea is that the unlabeled data can significantly improve the model performance when used in combination with the labeled data.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
\section{FixMatch}\label{sec:fixmatch}
 | 
					\section{FixMatch}\label{sec:fixmatch}
 | 
				
			||||||
There exists an already existing approach called FixMatch.
 | 
					There exists an already existing approach called FixMatch.
 | 
				
			||||||
This was introduced in a Google Research paper from 2020~\cite{fixmatch}.
 | 
					This was introduced in a Google Research paper from 2020~\cite{fixmatch}.
 | 
				
			||||||
 | 
					The key idea of FixMatch is to leverage the unlabeled data by predicting pseudo-labels out of the known labels.
 | 
				
			||||||
 | 
					Then both, the known labels and the predicted ones are used side by side to train the model.
 | 
				
			||||||
 | 
					The labeled samples guide the learning process and the unlabeled samples gain additional information.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Not every pseudo prediction is kept to train the model further.
 | 
				
			||||||
 | 
					A confidence threshold is defined to evaluate how `confident` the model is of its prediction.
 | 
				
			||||||
 | 
					The prediction is dropped if the model is too less confident.
 | 
				
			||||||
 | 
					The quantity and quality of the obtained labels is crucial and they have an significant impact on the overall accuracy.
 | 
				
			||||||
 | 
					This means improving the pseudo-label framework as much as possible is important.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\subsection{Math of FixMatch}\label{subsec:math-of-fixmatch}
 | 
				
			||||||
 | 
					$\mathcal{L}_u$ defines the loss-function that trains the model.
 | 
				
			||||||
 | 
					The sum over a batch size $B_u$ takes the average loss of this batch and should be straight forward.
 | 
				
			||||||
 | 
					The input data is augmented in two different ways.
 | 
				
			||||||
 | 
					At first there is a weak augmentation $\mathcal{T}_{\text{weak}}(\cdot)$ which only applies basic transformation such as filtering and bluring.
 | 
				
			||||||
 | 
					Moreover, there is the strong augmentation $\mathcal{T}_{\text{strong}}(\cdot)$ which does cropouts and edge-detections.
 | 
				
			||||||
 | 
					The interesting part is the indicator function $\mathbbm{1}(\cdot)$ which applies a principle called `confidence-based masking`.
 | 
				
			||||||
 | 
					It retains a label only if its largest probability is above a threshold $\tau$.
 | 
				
			||||||
 | 
					Where $p_i \coloneqq F(\mathcal{T}_{\text{weak}}(u_i))$ is a model evaluation with a weakly augmented input.
 | 
				
			||||||
 | 
					The second part $\mathcal{H}(\cdot, \cdot)$ is a standard Cross-entropy loss function which takes two inputs.
 | 
				
			||||||
 | 
					$\hat{y}_i$, the obtained pseudo-label and $F(\mathcal{T}_{\text{strong}}(u_i))$, a model evaluation with strong augmentation.
 | 
				
			||||||
 | 
					The indicator function evaluates in $0$ if the pseudo prediction is not confident and the current loss evaluation will be dropped.
 | 
				
			||||||
 | 
					Otherwise it will be kept and trains the model further.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					\begin{equation}
 | 
				
			||||||
 | 
					  \label{eq:equation2}
 | 
				
			||||||
 | 
					  \mathcal{L}_u = \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i) \geq \tau) \mathcal{H}(\hat{y}_i,F(\mathcal{T}_{\text{strong}}(u_i)))
 | 
				
			||||||
 | 
					\end{equation}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
\section{Cross-Model Pseudo-Labeling}
 | 
					\section{Cross-Model Pseudo-Labeling}
 | 
				
			||||||
todo write stuff \cite{Xu_2022_CVPR}
 | 
					todo write stuff \cite{Xu_2022_CVPR}
 | 
				
			||||||
@@ -111,7 +149,7 @@ todo write stuff \cite{Xu_2022_CVPR}
 | 
				
			|||||||
  \mathcal{L}_u = \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i) \geq \tau) \mathcal{H}(\hat{y}_i,F(\mathcal{T}_{\text{strong}}(u_i)))
 | 
					  \mathcal{L}_u = \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i) \geq \tau) \mathcal{H}(\hat{y}_i,F(\mathcal{T}_{\text{strong}}(u_i)))
 | 
				
			||||||
\end{equation}
 | 
					\end{equation}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
\section{Figures}
 | 
					\section{Performance}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
\begin{figure}[h]
 | 
					\begin{figure}[h]
 | 
				
			||||||
  \centering
 | 
					  \centering
 | 
				
			||||||
 
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user