add stuff about semi-supervised learning and fixmatch
This commit is contained in:
		| @@ -1,6 +1,7 @@ | ||||
| \documentclass[sigconf]{acmart} | ||||
| \usepackage{amsmath} | ||||
| \usepackage{bbm} | ||||
| \usepackage{mathtools} | ||||
|  | ||||
| %% | ||||
| %% \BibTeX command to typeset BibTeX logo in the docs | ||||
| @@ -69,7 +70,7 @@ | ||||
| %% article. | ||||
| \begin{abstract} | ||||
|   Cross-Model Pseudo-Labeling is a new Framework for generating Pseudo-labels | ||||
|   for supervised leanring tasks where only a subset of true labels is known. | ||||
|   for supervised learning tasks where only a subset of true labels is known. | ||||
|   It builds upon the existing approach of FixMatch and improves it further by | ||||
|   using two different sized models complementing each other. | ||||
| \end{abstract} | ||||
| @@ -90,17 +91,54 @@ | ||||
|  | ||||
| \section{Introduction} | ||||
| For most supervised learning tasks are lots of training samples essential. | ||||
| with too less training data the model will gerneralize not well and not fit a real world task. | ||||
| Labeling datasets is in commonly seen as an expensive task and wants to be avoided as much as possible. | ||||
| With too less training data the model will gerneralize not well and not fit a real world task. | ||||
| Labeling datasets is commonly seen as an expensive task and wants to be avoided as much as possible. | ||||
| Thats why there is a machine-learning field called Semi-Supervised learning. | ||||
| The general approach is to train a model that predicts Pseudo-Labels which then can be used to train the main model. | ||||
|  | ||||
| \section{Semi-Supervised learning} | ||||
| todo write stuff | ||||
| In traditional supervised learning we have a labeled dataset. | ||||
| Each datapoint is associated with a corresponding target label. | ||||
| The goal is to fit a model to predict the labels from datapoints. | ||||
|  | ||||
| In traditional unsupervised learning no labels are known. | ||||
| The goal is to find patterns and structures in the data. | ||||
|  | ||||
| Those two techniques combined yield semi-supervised learning. | ||||
| Some of the labels are known, but for most of the data we have only the raw datapoints. | ||||
| The basic idea is that the unlabeled data can significantly improve the model performance when used in combination with the labeled data. | ||||
|  | ||||
| \section{FixMatch}\label{sec:fixmatch} | ||||
| There exists an already existing approach called FixMatch. | ||||
| This was introduced in a Google Research paper from 2020~\cite{fixmatch}. | ||||
| The key idea of FixMatch is to leverage the unlabeled data by predicting pseudo-labels out of the known labels. | ||||
| Then both, the known labels and the predicted ones are used side by side to train the model. | ||||
| The labeled samples guide the learning process and the unlabeled samples gain additional information. | ||||
|  | ||||
| Not every pseudo prediction is kept to train the model further. | ||||
| A confidence threshold is defined to evaluate how `confident` the model is of its prediction. | ||||
| The prediction is dropped if the model is too less confident. | ||||
| The quantity and quality of the obtained labels is crucial and they have an significant impact on the overall accuracy. | ||||
| This means improving the pseudo-label framework as much as possible is important. | ||||
|  | ||||
| \subsection{Math of FixMatch}\label{subsec:math-of-fixmatch} | ||||
| $\mathcal{L}_u$ defines the loss-function that trains the model. | ||||
| The sum over a batch size $B_u$ takes the average loss of this batch and should be straight forward. | ||||
| The input data is augmented in two different ways. | ||||
| At first there is a weak augmentation $\mathcal{T}_{\text{weak}}(\cdot)$ which only applies basic transformation such as filtering and bluring. | ||||
| Moreover, there is the strong augmentation $\mathcal{T}_{\text{strong}}(\cdot)$ which does cropouts and edge-detections. | ||||
| The interesting part is the indicator function $\mathbbm{1}(\cdot)$ which applies a principle called `confidence-based masking`. | ||||
| It retains a label only if its largest probability is above a threshold $\tau$. | ||||
| Where $p_i \coloneqq F(\mathcal{T}_{\text{weak}}(u_i))$ is a model evaluation with a weakly augmented input. | ||||
| The second part $\mathcal{H}(\cdot, \cdot)$ is a standard Cross-entropy loss function which takes two inputs. | ||||
| $\hat{y}_i$, the obtained pseudo-label and $F(\mathcal{T}_{\text{strong}}(u_i))$, a model evaluation with strong augmentation. | ||||
| The indicator function evaluates in $0$ if the pseudo prediction is not confident and the current loss evaluation will be dropped. | ||||
| Otherwise it will be kept and trains the model further. | ||||
|  | ||||
| \begin{equation} | ||||
|   \label{eq:equation2} | ||||
|   \mathcal{L}_u = \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i) \geq \tau) \mathcal{H}(\hat{y}_i,F(\mathcal{T}_{\text{strong}}(u_i))) | ||||
| \end{equation} | ||||
|  | ||||
| \section{Cross-Model Pseudo-Labeling} | ||||
| todo write stuff \cite{Xu_2022_CVPR} | ||||
| @@ -111,7 +149,7 @@ todo write stuff \cite{Xu_2022_CVPR} | ||||
|   \mathcal{L}_u = \frac{1}{B_u} \sum_{i=1}^{B_u} \mathbbm{1}(\max(p_i) \geq \tau) \mathcal{H}(\hat{y}_i,F(\mathcal{T}_{\text{strong}}(u_i))) | ||||
| \end{equation} | ||||
|  | ||||
| \section{Figures} | ||||
| \section{Performance} | ||||
|  | ||||
| \begin{figure}[h] | ||||
|   \centering | ||||
|   | ||||
		Reference in New Issue
	
	Block a user