diff --git a/summary/main.tex b/summary/main.tex index 1a5620e..ba464f6 100644 --- a/summary/main.tex +++ b/summary/main.tex @@ -85,14 +85,13 @@ The goal of this paper is video action recognition. Given are approximately 10 seconds long videos which should be classified. In this paper datasets with 400 and 101 different classes are used. The proposed approach is tested with 1\% and 10\% of known labels of all data points. -The used model depends on the exact usecase but in this case a 3D-ResNet50 and 3D-ResNet18 are used. \section{Semi-Supervised learning}\label{sec:semi-supervised-learning} In traditional supervised learning we have a labeled dataset. Each datapoint is associated with a corresponding target label. The goal is to fit a model to predict the labels from datapoints. -In traditional unsupervised learning no labels are known. +In traditional unsupervised learning there are also datapoints but no labels are known. The goal is to find patterns or structures in the data. Moreover, it can be used for clustering or downprojection. @@ -118,8 +117,8 @@ It relies on a single model for generating pseudo-labels which can introduce err Incorrect pseudo-labels may effect the learning process negatively. Furthermore, Fixmatch uses a compareably small model for label prediction which has a limited capacity. This can negatively affect the learning process as well. -There is no measure defined how certain the model is about its prediction. -Such a measure improves overall performance by filtering noisy and unsure predictions. +%There is no measure defined how certain the model is about its prediction. +%Such a measure improves overall performance by filtering noisy and unsure predictions. Cross-Model Pseudo-Labeling tries to address all of those limitations. \subsection{Math of FixMatch}\label{subsec:math-of-fixmatch} @@ -137,6 +136,12 @@ Moreover, there is the strong augmentation $\mathcal{T}_{\text{strong}}(\cdot)$ The indicator function $\mathbbm{1}(\cdot)$ applies a principle called `confidence-based masking`. It retains a label only if its largest probability is above a threshold $\tau$. Where $p_i \coloneqq F(\mathcal{T}_{\text{weak}}(u_i))$ is a model evaluation with a weakly augmented input. + +\begin{equation} + \label{eq:crossentropy} + \mathcal{H}(\hat{y}_i, y_i) = -\sum_{i=1} y_i \cdot log(\hat{y}_i) +\end{equation} + The second part $\mathcal{H}(\cdot, \cdot)$ is a standard Cross-entropy loss function which takes two inputs, the predicted and the true label. $\hat{y}_i$, the obtained pseudo-label and $F(\mathcal{T}_{\text{strong}}(u_i))$, a model evaluation with strong augmentation. The indicator function evaluates in $0$ if the pseudo prediction is not confident and the current loss evaluation will be dropped. @@ -145,7 +150,10 @@ Otherwise it evaluates to 1 and it will be kept and trains the model further. \section{Cross-Model Pseudo-Labeling}\label{sec:cross-model-pseudo-labeling} The newly invented approach of this paper is called Cross-Model Pseudo-Labeling (CMPL)\cite{Xu_2022_CVPR}. Figure~\ref{fig:cmpl-structure} visualizs the structure of CMPL\@. -We define two different models, a smaller auxiliary model and a larger model. +Two different models, a smaller auxiliary model and a larger model are defined. +They provide pseudo-labels for each other. +The two different models have a different structural bias which leads to complementary representations. +This symetric design performs a boost in performance. The SG label means stop gradient. The loss function evaluations are fed into the opposite model as loss. The two models train each other. @@ -225,6 +233,14 @@ For example: Those are just other approaches one can keep in mind. This doesn't mean they are better, in fact they performed even worse in this study. + +\section{Conclusion}\label{sec:conclusion} +In conclusion, Cross-Model Pseudo-Labeling demonstrates the potential to significantly advance the field of semi-supervised action recognition. +Cross-Model Pseudo-Labeling outperforms the supervised-only approach over several experiments by a multiple. +It surpasses most of the other existing pseudo-labeling frameworks. +Through the integration of main and auxiliary models, consistency regularization, and uncertainty estimation, CMPL offers a powerful framework for leveraging unlabeled data and improving model performance. +It paves the way for more accurate and efficient action recognition systems. + %% %% The next two lines define the bibliography style to be used, and %% the bibliography file.