add more sources
This commit is contained in:
parent
2bc8c45f9d
commit
9d2534deba
@ -5,8 +5,8 @@
|
||||
A test series was performed inside a Jupyter notebook.
|
||||
The active learning loop starts with a untrained RESNet-18 model and a random selection of samples.
|
||||
The muffin and chihuahua dataset was used for this binary classification task.
|
||||
The dataset is split into training and test set which contains $\sim4750$ train- and $\sim1250$ test-images.
|
||||
(see subsection~\ref{subsec:material-and-methods} for more infos)
|
||||
The dataset is split into training and test set which contains $\sim4750$ train- and $\sim1250$ test-images.\cite{muffinsvschiuahuakaggle}
|
||||
(see subsection~\ref{subsubsec:muffinvschihuahua} for more infos)
|
||||
|
||||
As a loss function CrossEntropyLoss was used and the Adam optimizer with a learning rate of $0.0001$.
|
||||
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
\subsection{Material}\label{subsec:material}
|
||||
|
||||
\subsubsection{Muffin vs chihuahua}
|
||||
\subsubsection{Muffin vs chihuahua}\label{subsubsec:muffinvschihuahua}
|
||||
Muffin vs chihuahua is a free dataset available on Kaggle.
|
||||
It consists of $\sim6000$ images of the two classes muffins and chihuahuas.
|
||||
The source data is scraped from google images and is split into a training and validation set.
|
||||
@ -98,11 +98,11 @@ The loop starts again with the new model and draws new samples from the unlabele
|
||||
\subsubsection{Semi-Supervised learning}
|
||||
In traditional supervised learning we have a labeled dataset.
|
||||
Each datapoint is associated with a corresponding target label.
|
||||
The goal is to fit a model to predict the labels from datapoints.
|
||||
The goal is to fit a model to predict the labels from datapoints.\cite{suptechniques}
|
||||
|
||||
In traditional unsupervised learning there are also datapoints but no labels are known.
|
||||
The goal is to find patterns or structures in the data.
|
||||
Moreover, it can be used for clustering or downprojection.
|
||||
Moreover, it can be used for clustering or downprojection.\cite{unsuptechlecture}
|
||||
|
||||
Those two techniques combined yield semi-supervised learning.
|
||||
Some of the labels are known, but for most of the data we have only the raw datapoints.
|
||||
@ -160,7 +160,7 @@ Figure~\ref{fig:cnn-architecture} shows a typical binary classification task.
|
||||
|
||||
\subsubsection{Softmax}
|
||||
|
||||
The Softmax function converts $n$ numbers of a vector into a probability distribution.
|
||||
The Softmax function~\ref{eq:softmax}\cite{liang2017soft} converts $n$ numbers of a vector into a probability distribution.
|
||||
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
|
||||
\begin{equation}\label{eq:softmax}
|
||||
\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \; for j\coloneqq\{1,\dots,K\}
|
||||
@ -186,9 +186,9 @@ $\mathcal{L}(p,q)$~\eqref{eq:crelbinarybatch} is the Binary Cross Entropy Loss f
|
||||
|
||||
Here the task is modeled as a mathematical problem to get a better understanding of how the problem is solved.
|
||||
|
||||
The model is defined as $g(\pmb{x};\pmb{w})$ where $\pmb{w}$ are the model weights and $\mathcal{X}$ the input samples.
|
||||
The model is defined as $g(\pmb{x};\pmb{w})$ where $\pmb{w}$ are the model weights and $\mathcal{X}$ the input samples.\cite{suptechniques}
|
||||
We define two hyperparameters, the batch size $\mathcal{B}$ and the sample size $\mathcal{S}$ where $\mathcal{B} < \mathcal{S}$.
|
||||
In every active learning loop iteration we sample $\mathcal{S}$ random samples~\eqref{eq:batchdef} from our total unlabeled sample set $\mathcal{X}_U \subset \mathcal{X}$.
|
||||
In every active learning loop iteration we sample $\mathcal{S}$ random samples~\eqref{eq:batchdef}\cite{suptechniques} from our total unlabeled sample set $\mathcal{X}_U \subset \mathcal{X}$.
|
||||
|
||||
\begin{equation}
|
||||
\label{eq:batchdef}
|
||||
@ -196,6 +196,7 @@ In every active learning loop iteration we sample $\mathcal{S}$ random samples~\
|
||||
\end{equation}
|
||||
|
||||
The model with the weights of the current loop iteration predicts pseudo predictions.
|
||||
Equation~\eqref{eq:equation2}\cite{generalAI} shows the definition of the model output $z$ when applied to input samples $\pmb{x}$.
|
||||
|
||||
\begin{equation}\label{eq:equation2}
|
||||
z = g(\pmb{x};\pmb{w})
|
||||
@ -229,7 +230,7 @@ We define $\text{min}_n(S)$ and $\text{max}_n(S)$ respectively in equation~\ref{
|
||||
|
||||
This notation helps to define which subsets of samples to give the user for labeling.
|
||||
There are different ways how this subset can be chosen.
|
||||
In this PW we do the obvious experiments with High-Certainty first in paragraph~\ref{par:low-certainty-first}, Low-Certainty first in paragraph~\ref{par:high-certainty-first}.
|
||||
In this PW we do the obvious experiments with High-Certainty first in paragraph~\ref{par:low-certainty-first}, Low-Certainty\cite{certainty-based-al} first in paragraph~\ref{par:high-certainty-first}.
|
||||
Furthermore, the two mixtures between them, half-high and half-low certain and only the middle section of the sorted certainty scores.
|
||||
|
||||
\paragraph{Low certainty first}\label{par:low-certainty-first}
|
||||
@ -274,7 +275,7 @@ After labelling the model $g(\pmb{x};\pmb{w})$ is trained with the new samples a
|
||||
The loop starts again with the new model and draws new unlabeled samples from $\mathcal{X}_U$ as in~\eqref{eq:batchdef}.
|
||||
|
||||
\paragraph{Further improvement by class balancing} \label{par:furtherimprovements}
|
||||
An intuitive improvement step might be the balancing of the class predictions.
|
||||
An intuitive improvement step might be the balancing of the class predictions.\cite{certainty-based-al}
|
||||
The selected samples of the active learning step above from $\mathcal{X}_t$ might all be from one class.
|
||||
This is bad for the learning process because the model might overfit to one class if always the same class is selected.
|
||||
|
||||
|
@ -126,6 +126,14 @@ doi = {10.1007/978-0-387-85820-3_23}
|
||||
publisher={Johannes Kepler Universität Linz}
|
||||
}
|
||||
|
||||
@misc{unsuptechlecture,
|
||||
author = {Andreas Radler, Markus Holzleitner},
|
||||
title = {Lecture notes in Machine Learning: Unsupervised Techniques},
|
||||
month = {July},
|
||||
year = {2023},
|
||||
publisher={Johannes Kepler Universität Linz}
|
||||
}
|
||||
|
||||
@online{ROCWikipedia,
|
||||
author = "Wikimedia Commons",
|
||||
title = "Receiver operating characteristic",
|
||||
@ -175,3 +183,27 @@ doi = {10.1007/978-0-387-85820-3_23}
|
||||
volume = {14},
|
||||
year = {1952}
|
||||
}
|
||||
|
||||
@article{certainty-based-al,
|
||||
title = {Certainty-based active learning for sampling imbalanced datasets},
|
||||
journal = {Neurocomputing},
|
||||
volume = {119},
|
||||
pages = {350-358},
|
||||
year = {2013},
|
||||
note = {Intelligent Processing Techniques for Semantic-based Image and Video Retrieval},
|
||||
issn = {0925-2312},
|
||||
doi = {https://doi.org/10.1016/j.neucom.2013.03.023},
|
||||
url = {https://www.sciencedirect.com/science/article/pii/S0925231213004803},
|
||||
author = {JuiHsi Fu and SingLing Lee},
|
||||
keywords = {Active learning, Imbalanced data classification, Neighborhood exploration, Certainty-based neighborhood, Local classification behavior},
|
||||
abstract = {Active learning is to learn an accurate classifier within as few queried labels as possible. For practical applications, we propose a Certainty-Based Active Learning (CBAL) algorithm to solve the imbalanced data classification problem in active learning. Without being affected by irrelevant samples which might overwhelm the minority class, the importance of each unlabeled sample is carefully measured within an explored neighborhood. For handling the agnostic case, IWAL-ERM is integrated into our approach without costs. Thus our CBAL is designed to determine the query probability within an explored neighborhood for each unlabeled sample. The potential neighborhood is incrementally explored, and there is no need to define the neighborhood size in advance. In our theoretical analysis, it is presented that CBAL has a polynomial label query improvement over passive learning. And the experimental results on synthetic and real-world datasets show that, CBAL has the ability of identifying informative samples and dealing with the imbalanced data classification problem in active learning.}
|
||||
}
|
||||
|
||||
@inproceedings{liang2017soft,
|
||||
title={Soft-margin softmax for deep classification},
|
||||
author={Liang, Xuezhi and Wang, Xiaobo and Lei, Zhen and Liao, Shengcai and Li, Stan Z},
|
||||
booktitle={International Conference on Neural Information Processing},
|
||||
pages={413--421},
|
||||
year={2017},
|
||||
organization={Springer}
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user