add balanced stuff and code snippet
This commit is contained in:
parent
2ff58491b0
commit
79d04ccef3
BIN
rsc/AUC_balanced__4_24.png
Normal file
BIN
rsc/AUC_balanced__4_24.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 52 KiB |
Binary file not shown.
Before Width: | Height: | Size: 79 KiB After Width: | Height: | Size: 32 KiB |
@ -96,11 +96,14 @@ The previous process was improved by balancing the classes to give the oracle fo
|
||||
The idea is that it might happen that the low certainty samples might always be of one class and thus lead to an imbalanced learning process.
|
||||
The sample selection was modified as described in~\ref{par:furtherimprovements}.
|
||||
|
||||
Unfortunately it didn't improve the convergence speed and it seems to make no difference compared to not balancing.
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=\linewidth]{../rsc/AUC_balanced__4_24}
|
||||
\caption{Dagster asset graph}
|
||||
\label{fig:balancedauc}
|
||||
\end{figure}
|
||||
|
||||
Unfortunately it didn't improve the convergence speed and it seems to make no difference compared to not balancing and seems mostly even worse.
|
||||
This might be the case because the uncertainty sampling process balances the draws itself pretty well.
|
||||
|
||||
% todo insert imgs
|
||||
|
||||
Not really.
|
||||
|
||||
% todo add img and add stuff
|
||||
\ref{fig:balancedauc} shows the AUC curve with a batch size $\mathcal{B}=4$ and a sample size $\mathcal{S}=24$ for both, balanced and unbalanced low certainty sampling.
|
||||
The result looks similar for the other batch sizes and sample sizes.
|
@ -4,7 +4,7 @@
|
||||
|
||||
To get accurate performance measures the active-learning process was implemented in a Jupyter notebook first.
|
||||
This helps to choose which of the methods performs the best and which one to use in the final Dagster pipeline.
|
||||
A straight forward machine-learning pipeline was implemented with the help of Pytorch and RESNet.
|
||||
A straight forward machine-learning pipeline was implemented with the help of Pytorch and RESNet-18.
|
||||
|
||||
\begin{lstlisting}[language=Python, caption=Certainty sampling process of selected metric]
|
||||
df = df.sort_values(by=['score'])
|
||||
@ -34,9 +34,32 @@ match predict_mode:
|
||||
\end{lstlisting}
|
||||
|
||||
Moreover, the Dataset was manually imported and preprocessed with random augmentations.
|
||||
After each loop iteration the Area Under the Curve (AUC) was calculated over the validation set to get a performance measure.
|
||||
All those AUC were visualized in a line plot, see~\ref{sec:experimental-results} for the results.
|
||||
|
||||
\subsection{Balanced sample selection}
|
||||
|
||||
To avoid the model to learn only from one class, the sample selection process was balanced as mentioned in~\ref{par:furtherimprovements}.
|
||||
Simply sort by predicted class first and then select the $\mathcal{B}/2$ lowest certain samples per class.
|
||||
This should help to balance the sample selection process.
|
||||
|
||||
\begin{lstlisting}[language=Python, caption=Certainty sampling process with class balancing]
|
||||
# sort by pseudolabel
|
||||
df.sort_values(by=['pseudolabel'], inplace=True)
|
||||
# sort half batches by pseudolabel + score
|
||||
df[:int(sample_size/2)] = df[:int(sample_size/2)].sort_values(by=['pseudolabel', 'score'])
|
||||
df[int(sample_size/2):] = df[int(sample_size/2):].sort_values(by=['pseudolabel', 'score'])
|
||||
|
||||
halfbatchsize = int(batch_size/2)
|
||||
train_samples = pd
|
||||
.concat([df[:halfbatchsize],df[int(sample_size/2):int(sample_size/2)+halfbatchsize]])["sample"]
|
||||
.values.tolist()
|
||||
unlabeled_samples += pd
|
||||
.concat([df[halfbatchsize:int(sample_size/2)], df[int(sample_size/2)+halfbatchsize:]])["sample"]
|
||||
.values.tolist()
|
||||
|
||||
\end{lstlisting}
|
||||
|
||||
\subsection{Dagster with Label-Studio}\label{subsec:dagster-with-label-studio}
|
||||
|
||||
The main goal is to implement an active learning loop with the help of Dagster and Label-Studio.
|
||||
@ -45,7 +68,6 @@ This helps building reusable building blocks and to keep the code clean.
|
||||
|
||||
Most of the python routines implemented in section~\ref{subsec:jupyter} were reused here and just slightly modified to fit the Dagster pipeline.
|
||||
|
||||
% todo short this figure to half!
|
||||
\begin{figure}
|
||||
\centering
|
||||
\includegraphics[width=\linewidth]{../rsc/dagster/assets}
|
||||
|
@ -21,11 +21,9 @@ The sample-selection metric might select samples just from one class by chance.
|
||||
Does balancing this distribution help the model performance?
|
||||
\subsection{Outline}\label{subsec:outline}
|
||||
|
||||
In section~\ref{sec:material-and-methods} we talk about general methods and materials we use.
|
||||
First the problem is modeled mathematically in~\ref{subsubsec:mathematicalmodeling} and then implemented and benchmarked in a Jupyter notebook~\ref{subsubsec:jupyternb}
|
||||
Section~\ref{sec:implementation} gives deeper insights to the implementation for the interested reader.
|
||||
In section~\ref{sec:material-and-methods} we talk about general methods and materials used.
|
||||
First the problem is modeled mathematically in~\ref{subsubsec:mathematicalmodeling} and then implemented and benchmarked in a Jupyter notebook~\ref{subsubsec:jupyternb}.
|
||||
Section~\ref{sec:implementation} gives deeper insights to the implementation for the interested reader with some code snippets.
|
||||
The experimental results~\ref{sec:experimental-results} are well-presented with clear figures illustrating the performance of active learning across different sample sizes and batch sizes.
|
||||
The conclusion~\ref{subsec:conclusion} provides a overview of the findings, highlighting the benefits of active learning.
|
||||
Additionally the outlook section suggests avenues for future research which are not covered in this work.
|
||||
The experimental results are well-presented with clear figures illustrating the performance of active learning across different sample sizes and batch sizes.
|
||||
|
||||
% todo proper linking to sections
|
||||
Additionally the outlook section~\ref{subsec:outlook} suggests avenues for future research which are not covered in this work.
|
Loading…
x
Reference in New Issue
Block a user