%% The abstract is a short summary of the work to be presented in the
%% article.
\begin{abstract}
Active learning might result in a faster model convergence and thus less labeled samples would be required. This method might be beneficial in areas where labeling datasets is demanding and reducing computational effort is not the main objective.
\end{abstract}
%%
%% Keywords. The author(s) should pick words that accurately describe
%% the work being presented. Separate the keywords with commas.
\caption{Architecture of Cross-Model Pseudo-Labeling}
\label{fig:cmpl-structure}
\end{figure}
\subsection{Math of CMPL}\label{subsec:math}
The loss function of CMPL is similar to that one explaind above.
But we have to differ from the loss generated from the supervised samples where the labels are known and the unsupervised loss where no labels are available.
The two equations~\ref{eq:cmpl-losses1} and~\ref{eq:cmpl-losses2} are normal Cross-Entropy loss functions generated with the supervised labels of the two seperate models.
The used model architectures depend highly on the task to be performed.
In this case the task is video action recognition.
A 3D-ResNet50 was chosen for the main model and a smaller 3D-ResNet18 for the auxiliary model.
\section{Performance}\label{sec:performance}
In figure~\ref{fig:results} a performance comparison is shown between just using the supervised samples for training against some different pseudo label frameworks.
One can clearly see that the performance gain with the new CMPL framework is quite significant.
For evaluation the Kinetics-400 and UCF-101 datasets are used.
And as a backbone model a 3D-ResNet18 and 3D-ResNet50 are used.
Even when only 1\% of true labels are known for the UCF-101 dataset 25.1\% of the labels could be predicted right.
How the pseudo-labels are generated may impact the overall performance.
In this paper the pseudo-labels are obtained by the cross-model approach.
But there might be other strategies as well.
For example:
\begin{enumerate*}
\item Self-First: Each network uses just its own prediction if its confident enough.
If not, it uses its sibling net prediction.
\item Opposite-First: Each net prioritizes the prediction of the sibling network.
\item Maximum: The most confident prediction is leveraged.
\item Average: The two predictions are averaged before deriving the pseudo-label
\end{enumerate*}.
Those are just other approaches one can keep in mind.
This doesn't mean they are better, in fact they performed even worse in this study.
\section{Conclusion}\label{sec:conclusion}
In conclusion, Cross-Model Pseudo-Labeling demonstrates the potential to significantly advance the field of semi-supervised action recognition.
Cross-Model Pseudo-Labeling outperforms the supervised-only approach over several experiments by a multiple.
It surpasses most of the other existing pseudo-labeling frameworks.
Through the integration of main and auxiliary models, consistency regularization, and uncertainty estimation, CMPL offers a powerful framework for leveraging unlabeled data and improving model performance.
It paves the way for more accurate and efficient action recognition systems.
%%
%% The next two lines define the bibliography style to be used, and
%% the bibliography file.
\bibliographystyle{ACM-Reference-Format}
\bibliography{sources}
%%
%% If your work has an appendix, this is the place to put it.