diff --git a/src/implementation.tex b/src/implementation.tex index ff59628..4f05bf7 100644 --- a/src/implementation.tex +++ b/src/implementation.tex @@ -13,4 +13,6 @@ A straight forward machine-learning pipeline was implemented with the help of Py \begin{lstlisting} # todo listing of the sample selection process -\end{lstlisting} \ No newline at end of file +\end{lstlisting} + +Moreover, the Dataset was manually imported and preprocessed with random augmentations. diff --git a/src/main.tex b/src/main.tex index 0c6090f..fdb9ac8 100644 --- a/src/main.tex +++ b/src/main.tex @@ -1,4 +1,4 @@ -\def\ieee{0} +\def\ieee{1} \if\ieee1 \documentclass[sigconf]{acmart} diff --git a/src/materialandmethods.tex b/src/materialandmethods.tex index 30ac3ba..3308a10 100644 --- a/src/materialandmethods.tex +++ b/src/materialandmethods.tex @@ -3,16 +3,31 @@ \subsection{Material}\label{subsec:material} \subsubsection{Dagster} -Dagster is a data orchestrator for machine learning, analytics, and ETL workflows. +Dagster is an open-source data orchestrator for machine learning, analytics, and ETL workflows. It lets you define pipelines in terms of the data flow between reusable, logical components. -Dagster is a tool that helps to build scalable and reliable data workflows. +With Dagster scalable and reliable data workflows can be built. + +The most important building blocks in Dagster are Assets, Jobs and Ops. +Assets are objects in persistent storage which contain a description as code how to update this object. +Whenever persistent storage is required, eg. storing a model, storing metadata, configurations a asset should be used. +Assets can be combined to an asset graph to model dependencies of the data flow. +Jobs are the main triggers of a pipeline and can be triggered by the Web UI, fix schedules or changes of a sensor. +To perform real tasks in code a Asset consists of an graph of Ops. +An Op is a function that performs a task and can be used to split the code into reusable components. + +Dagster has a well-built web interface to monitor jobs and pipelines. \subsubsection{Label-Studio} \subsubsection{Pytorch} \subsubsection{NVTec} + + + \subsubsection{Imagenet} -\subsubsection{Anomalib} -% todo maybe remove? +\subsubsection{Muffin vs chihuahua} +Muffin vs chihuahua is a free dataset available on Kaggle. +It consists of $\sim$1500 images of muffins and chihuahuas. + \subsection{Methods}\label{subsec:methods}