describe muffin dataset more in detail
experimental result describe more the setup
This commit is contained in:
parent
5d6e8177da
commit
ef23935c93
BIN
rsc/muffin_chiauaua_poster.jpg
Normal file
BIN
rsc/muffin_chiauaua_poster.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 154 KiB |
@ -2,9 +2,21 @@
|
||||
|
||||
\subsection{Does Active-Learning benefit the learning process?}\label{subsec:does-active-learning-benefit-the-learning-process?}
|
||||
|
||||
With the test setup described in section~\ref{sec:implementation} a test series was performed.
|
||||
A test series was performed inside a Jupyter notebook.
|
||||
The active learning loop starts with a untrained RESNet-18 model and a random selection of samples.
|
||||
The muffin and chihuahua dataset was used for this binary classification task.
|
||||
The dataset is split into training and test set which contains $\sim4750$ train- and $\sim1250$ test-images.
|
||||
(see~\ref{subsec:material-and-methods} for more infos)
|
||||
|
||||
As a loss function CrossEntropyLoss was used and the Adam optimizer with a learning rate of $0.0001$.
|
||||
|
||||
$\mathcal{B}$ samples are selected from the $\mathcal{S}$ samples and labeled by an oracle.
|
||||
Here the oracle is just labeling the samples with the correct class because the dataset is synthetic and the labels are known.
|
||||
No real human annotator was used because of huge time consumption and the goal is to benchmark the active learning process itself.
|
||||
Afterwards the model is trained with this labeled samples and the loop starts again with predicting $\mathcal{B}$ samples from the $\mathcal{S}$ drawn samples.
|
||||
|
||||
Several different batch sizes $\mathcal{B} = \left\{ 2,4,6,8 \right\}$ and sample sizes $\mathcal{S} = \left\{ 2\mathcal{B}_i,4\mathcal{B}_i,5\mathcal{B}_i,10\mathcal{B}_i \right\}$
|
||||
dependent on the selected batch size were selected.
|
||||
dependent on the selected batch size were used.
|
||||
We define the baseline (passive learning) AUC curve as the supervised learning process without any active learning.
|
||||
The following graphs are only a subselection of the test series which give the most insights.
|
||||
|
||||
|
@ -2,6 +2,24 @@
|
||||
|
||||
\subsection{Material}\label{subsec:material}
|
||||
|
||||
\subsubsection{Muffin vs chihuahua}
|
||||
Muffin vs chihuahua is a free dataset available on Kaggle.
|
||||
It consists of $\sim6000$ images of the two classes muffins and chihuahuas.
|
||||
The source data is scraped from google images and is split into a training and validation set.
|
||||
The trainings set contains $\sim4750$ and test set $\sim1250$ images, overall the two classes are almost balanced.
|
||||
This is expected to be a relatively hard classification task because the eyes of chihuahuas and chocolate parts of muffins look very similar.
|
||||
It is used in this practical work as a binary classification task to evaluate the performance of active learning.\cite{muffinsvschiuahuakaggle}
|
||||
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=\linewidth/2]{../rsc/muffin_chiauaua_poster}
|
||||
\caption{Sample images from dataset. \cite{muffinsvschiuahuakaggle_poster}}
|
||||
\label{fig:roc-example}
|
||||
\end{figure}
|
||||
|
||||
|
||||
\subsection{Methods}\label{subsec:methods}
|
||||
|
||||
\subsubsection{Dagster}
|
||||
Dagster is an open-source data orchestrator for machine learning, analytics, and ETL workflows.
|
||||
It lets you define pipelines in terms of the data flow between reusable, logical components.
|
||||
@ -36,14 +54,6 @@ It is widely used in the data science, mathematics and machine learning communit
|
||||
|
||||
In the case of this practical work it can be used to test and evaluate the active learning loop before implementing it in a Dagster pipeline. \cite{jupyter}
|
||||
|
||||
\subsubsection{Muffin vs chihuahua}
|
||||
Muffin vs chihuahua is a free dataset available on Kaggle.
|
||||
It consists of $\sim6000$ images of muffins and chihuahuas.
|
||||
This is expected to be a relatively hard classification task because the eyes of chihuahuas and chocolate parts of muffins look very similar.
|
||||
It is used in this practical work for a binary classification task to evaluate the performance of active learning.
|
||||
\cite{muffinsvschiuahuakaggle}
|
||||
|
||||
\subsection{Methods}\label{subsec:methods}
|
||||
|
||||
\subsubsection{Active-Learning}
|
||||
Active learning is a subfield of supervised learning.
|
||||
|
@ -82,6 +82,14 @@ and Sardinha, Alberto",
|
||||
note = "[Online; accessed 12-April-2024]"
|
||||
}
|
||||
|
||||
@misc{muffinsvschiuahuakaggle_poster,
|
||||
author = {},
|
||||
title = {{Muffin vs Chihuahua Kaggle Dataset Poster Image}},
|
||||
howpublished = "\url{https://i.postimg.cc/2SXNWP7f/muffin-meme2.jpg}",
|
||||
year = {2024},
|
||||
note = "[Online; accessed 12-April-2024]"
|
||||
}
|
||||
|
||||
@INCOLLECTION{RubensRecSysHB2010,
|
||||
author = {Neil Rubens and Dain Kaplan and Masashi Sugiyama},
|
||||
title = {Active Learning in Recommender Systems},
|
||||
|
Loading…
Reference in New Issue
Block a user