diff --git a/src/main.tex b/src/main.tex index 142955e..7081620 100644 --- a/src/main.tex +++ b/src/main.tex @@ -14,7 +14,7 @@ \providecommand\BibTeX{{% \normalfont B\kern-0.5em{\scshape i\kern-0.25em b}\kern-0.8em\TeX}}} -\acmConference{Cross-Model Pseudo-Labeling}{2023}{Linz} +\acmConference{Minimize labeling effort of Binary classification Tasks with Active learning}{2023}{Linz} %% %% end of the preamble, start of the body of the document source. @@ -74,150 +74,6 @@ \input{implementation} \input{experimentalresults} \input{conclusionandoutlook} - - \section{FixMatch}\label{sec:fixmatch} - There is an already existing approach called FixMatch. - This was introduced in a Google Research paper from 2020~\cite{fixmatch}. - The key idea of FixMatch is to leverage the unlabeled data by predicting pseudo-labels out of the known labels. - Then both, the known labels and the predicted ones are used side by side to train the model. - The labeled samples guide the learning process and the unlabeled samples gain additional information. - - Not every pseudo prediction is kept to train the model further. - A confidence threshold is defined to evaluate how `confident` the model is about its prediction. - The prediction is dropped if the model is too less confident. - The quantity and quality of the obtained labels is crucial and they have a significant impact on the overall accuracy. - This means improving the pseudo-label framework as much as possible is essential. - - FixMatch results in some major limitations. - It relies on a single model for generating pseudo-labels which can introduce errors and uncertainty in the labels. - Incorrect pseudo-labels may effect the learning process negatively. - Furthermore, Fixmatch uses a compareably small model for label prediction which has a limited capacity. - This can negatively affect the learning process as well. -%There is no measure defined how certain the model is about its prediction. -%Such a measure improves overall performance by filtering noisy and unsure predictions. - Cross-Model Pseudo-Labeling tries to address all of those limitations. - - \subsection{Math of FixMatch}\label{subsec:math-of-fixmatch} - Equation~\ref{eq:fixmatch} defines the loss-function that trains the model. - The sum over a batch size $B_u$ takes the average loss of this batch and should be familiar. - The input data is augmented in two different ways. - At first there is a weak augmentation $\mathcal{T}_{\text{weak}}(\cdot)$ which only applies basic transformation such as filtering and bluring. - Moreover, there is the strong augmentation $\mathcal{T}_{\text{strong}}(\cdot)$ which does cropouts and random augmentations. - - \begin{equation} - \label{eq:fixmatch} - \mathcal{L}_u = \frac{1}{B_u} \sum_{i=1}^{B_u} {1}(\max(p_i) \geq \tau) \mathcal{H}(\hat{y}_i,F(\mathcal{T}_{\text{strong}}(u_i))) - \end{equation} - - The indicator function ${1}(\cdot)$ applies a principle called `confidence-based masking`. - It retains a label only if its largest probability is above a threshold $\tau$. - Where $p_i \coloneqq F(\mathcal{T}_{\text{weak}}(u_i))$ is a model evaluation with a weakly augmented input. - - \begin{equation} - \label{eq:crossentropy} - \mathcal{H}(\hat{y}_i, y_i) = -\sum_{i=1} y_i \cdot log(\hat{y}_i) - \end{equation} - - The second part $\mathcal{H}(\cdot, \cdot)$ is a standard Cross-entropy loss function which takes two inputs, the predicted and the true label. - $\hat{y}_i$, the obtained pseudo-label and $F(\mathcal{T}_{\text{strong}}(u_i))$, a model evaluation with strong augmentation. - The indicator function evaluates in $0$ if the pseudo prediction is not confident and the current loss evaluation will be dropped. - Otherwise it evaluates to 1 and it will be kept and trains the model further. - - \section{Cross-Model Pseudo-Labeling}\label{sec:cross-model-pseudo-labeling} - The newly invented approach of this paper is called Cross-Model Pseudo-Labeling (CMPL)\cite{Xu_2022_CVPR}. - Figure~\ref{fig:cmpl-structure} visualizs the structure of CMPL\@. - Two different models, a smaller auxiliary model and a larger model are defined. - They provide pseudo-labels for each other. - The two different models have a different structural bias which leads to complementary representations. - This symetric design performs a boost in performance. - The SG label means 'Stop Gradient'. - The loss function evaluations are fed into the opposite model as loss. - The two models train each other. - - - \begin{figure}[h] - \centering - \includegraphics[width=\linewidth]{../rsc/structure} - \caption{Architecture of Cross-Model Pseudo-Labeling} - \label{fig:cmpl-structure} - \end{figure} - - \subsection{Math of CMPL}\label{subsec:math} - The loss function of CMPL is similar to that one explaind above. - But we have to differ from the loss generated from the supervised samples where the labels are known and the unsupervised loss where no labels are available. - - The two equations~\ref{eq:cmpl-losses1} and~\ref{eq:cmpl-losses2} are normal Cross-Entropy loss functions generated with the supervised labels of the two seperate models. - - - \begin{align} - \label{eq:cmpl-losses1} - \mathcal{L}_s^F &= \frac{1}{B_l} \sum_{i=1}^{B_l} \mathcal{H}(y_i,F(\mathcal{T}^F_{\text{standard}}(v_i)))\\ - \label{eq:cmpl-losses2} - \mathcal{L}_s^A &= \frac{1}{B_l} \sum_{i=1}^{B_l} \mathcal{H}(y_i,A(\mathcal{T}^F_{\text{standard}}(v_i))) - \end{align} - - Equation~\ref{eq:cmpl-loss3} and~\ref{eq:cmpl-loss4} are the unsupervised losses. - They are very similar to FastMatch, but important to note is that the confidence-based masking is applied to the opposite corresponding model. - - \begin{align} - \label{eq:cmpl-loss3} - \mathcal{L}_u^F &= \frac{1}{B_u} \sum_{i=1}^{B_u} {1}(\max(p_i^A) \geq \tau) \mathcal{H}(\hat{y}_i^A,F(\mathcal{T}_{\text{strong}}(u_i)))\\ - \label{eq:cmpl-loss4} - \mathcal{L}_u^A &= \frac{1}{B_u} \sum_{i=1}^{B_u} {1}(\max(p_i^F) \geq \tau) \mathcal{H}(\hat{y}_i^F,A(\mathcal{T}_{\text{strong}}(u_i))) - \end{align} - - Finally to train the main objective a overall loss is calculated by simply summing all the losses. - The loss is regulated by an hyperparamter $\lambda$ to enhance the importance of the supervised loss. - - \begin{equation} - \label{eq:loss-main-obj} - \mathcal{L} = (\mathcal{L}_s^F + \mathcal{L}_s^A) + \lambda(\mathcal{L}_u^F + \mathcal{L}_u^A) - \end{equation} - - \section{Architecture}\label{sec:Architecture} - The used model architectures depend highly on the task to be performed. - In this case the task is video action recognition. - A 3D-ResNet50 was chosen for the main model and a smaller 3D-ResNet18 for the auxiliary model. - - \section{Performance}\label{sec:performance} - - In figure~\ref{fig:results} a performance comparison is shown between just using the supervised samples for training against some different pseudo label frameworks. - One can clearly see that the performance gain with the new CMPL framework is quite significant. - For evaluation the Kinetics-400 and UCF-101 datasets are used. - And as a backbone model a 3D-ResNet18 and 3D-ResNet50 are used. - Even when only 1\% of true labels are known for the UCF-101 dataset 25.1\% of the labels could be predicted right. - - \begin{figure}[h] - \centering - \includegraphics[width=\linewidth]{../rsc/results} - \caption{Performance comparisons between CMPL, FixMatch and supervised learning only} - \label{fig:results} - \end{figure} - - \section{Further schemes}\label{sec:further-schemes} - How the pseudo-labels are generated may impact the overall performance. - In this paper the pseudo-labels are obtained by the cross-model approach. - But there might be other strategies as well. - For example: - \begin{enumerate*} - \item Self-First: Each network uses just its own prediction if its confident enough. - If not, it uses its sibling net prediction. - \item Opposite-First: Each net prioritizes the prediction of the sibling network. - \item Maximum: The most confident prediction is leveraged. - \item Average: The two predictions are averaged before deriving the pseudo-label - \end{enumerate*}. - - Those are just other approaches one can keep in mind. - This doesn't mean they are better, in fact they performed even worse in this study. - - \section{Conclusion}\label{sec:conclusion} - In conclusion, Cross-Model Pseudo-Labeling demonstrates the potential to significantly advance the field of semi-supervised action recognition. - Cross-Model Pseudo-Labeling outperforms the supervised-only approach over several experiments by a multiple. - It surpasses most of the other existing pseudo-labeling frameworks. - Through the integration of main and auxiliary models, consistency regularization, and uncertainty estimation, CMPL offers a powerful framework for leveraging unlabeled data and improving model performance. - It paves the way for more accurate and efficient action recognition systems. - -%% %% The next two lines define the bibliography style to be used, and %% the bibliography file. \bibliographystyle{ACM-Reference-Format}