remove cmpl stuff
This commit is contained in:
		
							
								
								
									
										146
									
								
								src/main.tex
									
									
									
									
									
								
							
							
						
						
									
										146
									
								
								src/main.tex
									
									
									
									
									
								
							@@ -14,7 +14,7 @@
 | 
			
		||||
    \providecommand\BibTeX{{%
 | 
			
		||||
        \normalfont B\kern-0.5em{\scshape i\kern-0.25em b}\kern-0.8em\TeX}}}
 | 
			
		||||
 | 
			
		||||
\acmConference{Cross-Model Pseudo-Labeling}{2023}{Linz}
 | 
			
		||||
\acmConference{Minimize labeling effort of Binary classification Tasks with Active learning}{2023}{Linz}
 | 
			
		||||
 | 
			
		||||
%%
 | 
			
		||||
%% end of the preamble, start of the body of the document source.
 | 
			
		||||
@@ -74,150 +74,6 @@
 | 
			
		||||
    \input{implementation}
 | 
			
		||||
    \input{experimentalresults}
 | 
			
		||||
    \input{conclusionandoutlook}
 | 
			
		||||
 | 
			
		||||
    \section{FixMatch}\label{sec:fixmatch}
 | 
			
		||||
    There is an already existing approach called FixMatch.
 | 
			
		||||
    This was introduced in a Google Research paper from 2020~\cite{fixmatch}.
 | 
			
		||||
    The key idea of FixMatch is to leverage the unlabeled data by predicting pseudo-labels out of the known labels.
 | 
			
		||||
    Then both, the known labels and the predicted ones are used side by side to train the model.
 | 
			
		||||
    The labeled samples guide the learning process and the unlabeled samples gain additional information.
 | 
			
		||||
 | 
			
		||||
    Not every pseudo prediction is kept to train the model further.
 | 
			
		||||
    A confidence threshold is defined to evaluate how `confident` the model is about its prediction.
 | 
			
		||||
    The prediction is dropped if the model is too less confident.
 | 
			
		||||
    The quantity and quality of the obtained labels is crucial and they have a significant impact on the overall accuracy.
 | 
			
		||||
    This means improving the pseudo-label framework as much as possible is essential.
 | 
			
		||||
 | 
			
		||||
    FixMatch results in some major limitations.
 | 
			
		||||
    It relies on a single model for generating pseudo-labels which can introduce errors and uncertainty in the labels.
 | 
			
		||||
    Incorrect pseudo-labels may effect the learning process negatively.
 | 
			
		||||
    Furthermore, Fixmatch uses a compareably small model for label prediction which has a limited capacity.
 | 
			
		||||
    This can negatively affect the learning process as well.
 | 
			
		||||
%There is no measure defined how certain the model is about its prediction.
 | 
			
		||||
%Such a measure improves overall performance by filtering noisy and unsure predictions.
 | 
			
		||||
    Cross-Model Pseudo-Labeling tries to address all of those limitations.
 | 
			
		||||
 | 
			
		||||
    \subsection{Math of FixMatch}\label{subsec:math-of-fixmatch}
 | 
			
		||||
    Equation~\ref{eq:fixmatch} defines the loss-function that trains the model.
 | 
			
		||||
    The sum over a batch size $B_u$ takes the average loss of this batch and should be familiar.
 | 
			
		||||
    The input data is augmented in two different ways.
 | 
			
		||||
    At first there is a weak augmentation $\mathcal{T}_{\text{weak}}(\cdot)$ which only applies basic transformation such as filtering and bluring.
 | 
			
		||||
    Moreover, there is the strong augmentation $\mathcal{T}_{\text{strong}}(\cdot)$ which does cropouts and random augmentations.
 | 
			
		||||
 | 
			
		||||
    \begin{equation}
 | 
			
		||||
        \label{eq:fixmatch}
 | 
			
		||||
        \mathcal{L}_u = \frac{1}{B_u} \sum_{i=1}^{B_u} {1}(\max(p_i) \geq \tau) \mathcal{H}(\hat{y}_i,F(\mathcal{T}_{\text{strong}}(u_i)))
 | 
			
		||||
    \end{equation}
 | 
			
		||||
 | 
			
		||||
    The indicator function ${1}(\cdot)$ applies a principle called `confidence-based masking`.
 | 
			
		||||
    It retains a label only if its largest probability is above a threshold $\tau$.
 | 
			
		||||
    Where $p_i \coloneqq F(\mathcal{T}_{\text{weak}}(u_i))$ is a model evaluation with a weakly augmented input.
 | 
			
		||||
 | 
			
		||||
    \begin{equation}
 | 
			
		||||
        \label{eq:crossentropy}
 | 
			
		||||
        \mathcal{H}(\hat{y}_i, y_i) = -\sum_{i=1} y_i \cdot log(\hat{y}_i)
 | 
			
		||||
    \end{equation}
 | 
			
		||||
 | 
			
		||||
    The second part $\mathcal{H}(\cdot, \cdot)$ is a standard Cross-entropy loss function which takes two inputs, the predicted and the true label.
 | 
			
		||||
    $\hat{y}_i$, the obtained pseudo-label and $F(\mathcal{T}_{\text{strong}}(u_i))$, a model evaluation with strong augmentation.
 | 
			
		||||
    The indicator function evaluates in $0$ if the pseudo prediction is not confident and the current loss evaluation will be dropped.
 | 
			
		||||
    Otherwise it evaluates to 1 and it will be kept and trains the model further.
 | 
			
		||||
 | 
			
		||||
    \section{Cross-Model Pseudo-Labeling}\label{sec:cross-model-pseudo-labeling}
 | 
			
		||||
    The newly invented approach of this paper is called Cross-Model Pseudo-Labeling (CMPL)\cite{Xu_2022_CVPR}.
 | 
			
		||||
    Figure~\ref{fig:cmpl-structure} visualizs the structure of CMPL\@.
 | 
			
		||||
    Two different models, a smaller auxiliary model and a larger model are defined.
 | 
			
		||||
    They provide pseudo-labels for each other.
 | 
			
		||||
    The two different models have a different structural bias which leads to complementary representations.
 | 
			
		||||
    This symetric design performs a boost in performance.
 | 
			
		||||
    The SG label means 'Stop Gradient'.
 | 
			
		||||
    The loss function evaluations are fed into the opposite model as loss.
 | 
			
		||||
    The two models train each other.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
    \begin{figure}[h]
 | 
			
		||||
        \centering
 | 
			
		||||
        \includegraphics[width=\linewidth]{../rsc/structure}
 | 
			
		||||
        \caption{Architecture of Cross-Model Pseudo-Labeling}
 | 
			
		||||
        \label{fig:cmpl-structure}
 | 
			
		||||
    \end{figure}
 | 
			
		||||
 | 
			
		||||
    \subsection{Math of CMPL}\label{subsec:math}
 | 
			
		||||
    The loss function of CMPL is similar to that one explaind above.
 | 
			
		||||
    But we have to differ from the loss generated from the supervised samples where the labels are known and the unsupervised loss where no labels are available.
 | 
			
		||||
 | 
			
		||||
    The two equations~\ref{eq:cmpl-losses1} and~\ref{eq:cmpl-losses2} are normal Cross-Entropy loss functions generated with the supervised labels of the two seperate models.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
    \begin{align}
 | 
			
		||||
        \label{eq:cmpl-losses1}
 | 
			
		||||
        \mathcal{L}_s^F &= \frac{1}{B_l} \sum_{i=1}^{B_l} \mathcal{H}(y_i,F(\mathcal{T}^F_{\text{standard}}(v_i)))\\
 | 
			
		||||
        \label{eq:cmpl-losses2}
 | 
			
		||||
        \mathcal{L}_s^A &= \frac{1}{B_l} \sum_{i=1}^{B_l} \mathcal{H}(y_i,A(\mathcal{T}^F_{\text{standard}}(v_i)))
 | 
			
		||||
    \end{align}
 | 
			
		||||
 | 
			
		||||
    Equation~\ref{eq:cmpl-loss3} and~\ref{eq:cmpl-loss4} are the unsupervised losses.
 | 
			
		||||
    They are very similar to FastMatch, but important to note is that the confidence-based masking is applied to the opposite corresponding model.
 | 
			
		||||
 | 
			
		||||
    \begin{align}
 | 
			
		||||
        \label{eq:cmpl-loss3}
 | 
			
		||||
        \mathcal{L}_u^F &= \frac{1}{B_u} \sum_{i=1}^{B_u} {1}(\max(p_i^A) \geq \tau) \mathcal{H}(\hat{y}_i^A,F(\mathcal{T}_{\text{strong}}(u_i)))\\
 | 
			
		||||
        \label{eq:cmpl-loss4}
 | 
			
		||||
        \mathcal{L}_u^A &= \frac{1}{B_u} \sum_{i=1}^{B_u} {1}(\max(p_i^F) \geq \tau) \mathcal{H}(\hat{y}_i^F,A(\mathcal{T}_{\text{strong}}(u_i)))
 | 
			
		||||
    \end{align}
 | 
			
		||||
 | 
			
		||||
    Finally to train the main objective a overall loss is calculated by simply summing all the losses.
 | 
			
		||||
    The loss is regulated by an hyperparamter $\lambda$ to enhance the importance of the supervised loss.
 | 
			
		||||
 | 
			
		||||
    \begin{equation}
 | 
			
		||||
        \label{eq:loss-main-obj}
 | 
			
		||||
        \mathcal{L} = (\mathcal{L}_s^F + \mathcal{L}_s^A) + \lambda(\mathcal{L}_u^F + \mathcal{L}_u^A)
 | 
			
		||||
    \end{equation}
 | 
			
		||||
 | 
			
		||||
    \section{Architecture}\label{sec:Architecture}
 | 
			
		||||
    The used model architectures depend highly on the task to be performed.
 | 
			
		||||
    In this case the task is video action recognition.
 | 
			
		||||
    A 3D-ResNet50 was chosen for the main model and a smaller 3D-ResNet18 for the auxiliary model.
 | 
			
		||||
 | 
			
		||||
    \section{Performance}\label{sec:performance}
 | 
			
		||||
 | 
			
		||||
    In figure~\ref{fig:results} a performance comparison is shown between just using the supervised samples for training against some different pseudo label frameworks.
 | 
			
		||||
    One can clearly see that the performance gain with the new CMPL framework is quite significant.
 | 
			
		||||
    For evaluation the Kinetics-400 and UCF-101 datasets are used.
 | 
			
		||||
    And as a backbone model a 3D-ResNet18 and 3D-ResNet50 are used.
 | 
			
		||||
    Even when only 1\% of true labels are known for the UCF-101 dataset 25.1\% of the labels could be predicted right.
 | 
			
		||||
 | 
			
		||||
    \begin{figure}[h]
 | 
			
		||||
        \centering
 | 
			
		||||
        \includegraphics[width=\linewidth]{../rsc/results}
 | 
			
		||||
        \caption{Performance comparisons between CMPL, FixMatch and supervised learning only}
 | 
			
		||||
        \label{fig:results}
 | 
			
		||||
    \end{figure}
 | 
			
		||||
 | 
			
		||||
    \section{Further schemes}\label{sec:further-schemes}
 | 
			
		||||
    How the pseudo-labels are generated may impact the overall performance.
 | 
			
		||||
    In this paper the pseudo-labels are obtained by the cross-model approach.
 | 
			
		||||
    But there might be other strategies as well.
 | 
			
		||||
    For example:
 | 
			
		||||
    \begin{enumerate*}
 | 
			
		||||
        \item Self-First: Each network uses just its own prediction if its confident enough.
 | 
			
		||||
        If not, it uses its sibling net prediction.
 | 
			
		||||
        \item Opposite-First: Each net prioritizes the prediction of the sibling network.
 | 
			
		||||
        \item Maximum: The most confident prediction is leveraged.
 | 
			
		||||
        \item Average: The two predictions are averaged before deriving the pseudo-label
 | 
			
		||||
    \end{enumerate*}.
 | 
			
		||||
 | 
			
		||||
    Those are just other approaches one can keep in mind.
 | 
			
		||||
    This doesn't mean they are better, in fact they performed even worse in this study.
 | 
			
		||||
 | 
			
		||||
    \section{Conclusion}\label{sec:conclusion}
 | 
			
		||||
    In conclusion, Cross-Model Pseudo-Labeling demonstrates the potential to significantly advance the field of semi-supervised action recognition.
 | 
			
		||||
    Cross-Model Pseudo-Labeling outperforms the supervised-only approach over several experiments by a multiple.
 | 
			
		||||
    It surpasses most of the other existing pseudo-labeling frameworks.
 | 
			
		||||
    Through the integration of main and auxiliary models, consistency regularization, and uncertainty estimation, CMPL offers a powerful framework for leveraging unlabeled data and improving model performance.
 | 
			
		||||
    It paves the way for more accurate and efficient action recognition systems.
 | 
			
		||||
 | 
			
		||||
%%
 | 
			
		||||
%% The next two lines define the bibliography style to be used, and
 | 
			
		||||
%% the bibliography file.
 | 
			
		||||
    \bibliographystyle{ACM-Reference-Format}
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user