describe muffin dataset more in detail
experimental result describe more the setup
This commit is contained in:
parent
5d6e8177da
commit
ef23935c93
BIN
rsc/muffin_chiauaua_poster.jpg
Normal file
BIN
rsc/muffin_chiauaua_poster.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 154 KiB |
@ -2,9 +2,21 @@
|
|||||||
|
|
||||||
\subsection{Does Active-Learning benefit the learning process?}\label{subsec:does-active-learning-benefit-the-learning-process?}
|
\subsection{Does Active-Learning benefit the learning process?}\label{subsec:does-active-learning-benefit-the-learning-process?}
|
||||||
|
|
||||||
With the test setup described in section~\ref{sec:implementation} a test series was performed.
|
A test series was performed inside a Jupyter notebook.
|
||||||
|
The active learning loop starts with a untrained RESNet-18 model and a random selection of samples.
|
||||||
|
The muffin and chihuahua dataset was used for this binary classification task.
|
||||||
|
The dataset is split into training and test set which contains $\sim4750$ train- and $\sim1250$ test-images.
|
||||||
|
(see~\ref{subsec:material-and-methods} for more infos)
|
||||||
|
|
||||||
|
As a loss function CrossEntropyLoss was used and the Adam optimizer with a learning rate of $0.0001$.
|
||||||
|
|
||||||
|
$\mathcal{B}$ samples are selected from the $\mathcal{S}$ samples and labeled by an oracle.
|
||||||
|
Here the oracle is just labeling the samples with the correct class because the dataset is synthetic and the labels are known.
|
||||||
|
No real human annotator was used because of huge time consumption and the goal is to benchmark the active learning process itself.
|
||||||
|
Afterwards the model is trained with this labeled samples and the loop starts again with predicting $\mathcal{B}$ samples from the $\mathcal{S}$ drawn samples.
|
||||||
|
|
||||||
Several different batch sizes $\mathcal{B} = \left\{ 2,4,6,8 \right\}$ and sample sizes $\mathcal{S} = \left\{ 2\mathcal{B}_i,4\mathcal{B}_i,5\mathcal{B}_i,10\mathcal{B}_i \right\}$
|
Several different batch sizes $\mathcal{B} = \left\{ 2,4,6,8 \right\}$ and sample sizes $\mathcal{S} = \left\{ 2\mathcal{B}_i,4\mathcal{B}_i,5\mathcal{B}_i,10\mathcal{B}_i \right\}$
|
||||||
dependent on the selected batch size were selected.
|
dependent on the selected batch size were used.
|
||||||
We define the baseline (passive learning) AUC curve as the supervised learning process without any active learning.
|
We define the baseline (passive learning) AUC curve as the supervised learning process without any active learning.
|
||||||
The following graphs are only a subselection of the test series which give the most insights.
|
The following graphs are only a subselection of the test series which give the most insights.
|
||||||
|
|
||||||
|
@ -2,6 +2,24 @@
|
|||||||
|
|
||||||
\subsection{Material}\label{subsec:material}
|
\subsection{Material}\label{subsec:material}
|
||||||
|
|
||||||
|
\subsubsection{Muffin vs chihuahua}
|
||||||
|
Muffin vs chihuahua is a free dataset available on Kaggle.
|
||||||
|
It consists of $\sim6000$ images of the two classes muffins and chihuahuas.
|
||||||
|
The source data is scraped from google images and is split into a training and validation set.
|
||||||
|
The trainings set contains $\sim4750$ and test set $\sim1250$ images, overall the two classes are almost balanced.
|
||||||
|
This is expected to be a relatively hard classification task because the eyes of chihuahuas and chocolate parts of muffins look very similar.
|
||||||
|
It is used in this practical work as a binary classification task to evaluate the performance of active learning.\cite{muffinsvschiuahuakaggle}
|
||||||
|
|
||||||
|
\begin{figure}
|
||||||
|
\centering
|
||||||
|
\includegraphics[width=\linewidth/2]{../rsc/muffin_chiauaua_poster}
|
||||||
|
\caption{Sample images from dataset. \cite{muffinsvschiuahuakaggle_poster}}
|
||||||
|
\label{fig:roc-example}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
|
|
||||||
|
\subsection{Methods}\label{subsec:methods}
|
||||||
|
|
||||||
\subsubsection{Dagster}
|
\subsubsection{Dagster}
|
||||||
Dagster is an open-source data orchestrator for machine learning, analytics, and ETL workflows.
|
Dagster is an open-source data orchestrator for machine learning, analytics, and ETL workflows.
|
||||||
It lets you define pipelines in terms of the data flow between reusable, logical components.
|
It lets you define pipelines in terms of the data flow between reusable, logical components.
|
||||||
@ -36,14 +54,6 @@ It is widely used in the data science, mathematics and machine learning communit
|
|||||||
|
|
||||||
In the case of this practical work it can be used to test and evaluate the active learning loop before implementing it in a Dagster pipeline. \cite{jupyter}
|
In the case of this practical work it can be used to test and evaluate the active learning loop before implementing it in a Dagster pipeline. \cite{jupyter}
|
||||||
|
|
||||||
\subsubsection{Muffin vs chihuahua}
|
|
||||||
Muffin vs chihuahua is a free dataset available on Kaggle.
|
|
||||||
It consists of $\sim6000$ images of muffins and chihuahuas.
|
|
||||||
This is expected to be a relatively hard classification task because the eyes of chihuahuas and chocolate parts of muffins look very similar.
|
|
||||||
It is used in this practical work for a binary classification task to evaluate the performance of active learning.
|
|
||||||
\cite{muffinsvschiuahuakaggle}
|
|
||||||
|
|
||||||
\subsection{Methods}\label{subsec:methods}
|
|
||||||
|
|
||||||
\subsubsection{Active-Learning}
|
\subsubsection{Active-Learning}
|
||||||
Active learning is a subfield of supervised learning.
|
Active learning is a subfield of supervised learning.
|
||||||
|
@ -82,6 +82,14 @@ and Sardinha, Alberto",
|
|||||||
note = "[Online; accessed 12-April-2024]"
|
note = "[Online; accessed 12-April-2024]"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@misc{muffinsvschiuahuakaggle_poster,
|
||||||
|
author = {},
|
||||||
|
title = {{Muffin vs Chihuahua Kaggle Dataset Poster Image}},
|
||||||
|
howpublished = "\url{https://i.postimg.cc/2SXNWP7f/muffin-meme2.jpg}",
|
||||||
|
year = {2024},
|
||||||
|
note = "[Online; accessed 12-April-2024]"
|
||||||
|
}
|
||||||
|
|
||||||
@INCOLLECTION{RubensRecSysHB2010,
|
@INCOLLECTION{RubensRecSysHB2010,
|
||||||
author = {Neil Rubens and Dain Kaplan and Masashi Sugiyama},
|
author = {Neil Rubens and Dain Kaplan and Masashi Sugiyama},
|
||||||
title = {Active Learning in Recommender Systems},
|
title = {Active Learning in Recommender Systems},
|
||||||
|
Loading…
Reference in New Issue
Block a user