add dagster stuff

This commit is contained in:
lukas-heiligenbrunner 2024-04-27 09:53:59 +02:00
parent 0fc6898c70
commit 1133a5bcbd
3 changed files with 23 additions and 6 deletions

View File

@ -13,4 +13,6 @@ A straight forward machine-learning pipeline was implemented with the help of Py
\begin{lstlisting}
# todo listing of the sample selection process
\end{lstlisting}
\end{lstlisting}
Moreover, the Dataset was manually imported and preprocessed with random augmentations.

View File

@ -1,4 +1,4 @@
\def\ieee{0}
\def\ieee{1}
\if\ieee1
\documentclass[sigconf]{acmart}

View File

@ -3,16 +3,31 @@
\subsection{Material}\label{subsec:material}
\subsubsection{Dagster}
Dagster is a data orchestrator for machine learning, analytics, and ETL workflows.
Dagster is an open-source data orchestrator for machine learning, analytics, and ETL workflows.
It lets you define pipelines in terms of the data flow between reusable, logical components.
Dagster is a tool that helps to build scalable and reliable data workflows.
With Dagster scalable and reliable data workflows can be built.
The most important building blocks in Dagster are Assets, Jobs and Ops.
Assets are objects in persistent storage which contain a description as code how to update this object.
Whenever persistent storage is required, eg. storing a model, storing metadata, configurations a asset should be used.
Assets can be combined to an asset graph to model dependencies of the data flow.
Jobs are the main triggers of a pipeline and can be triggered by the Web UI, fix schedules or changes of a sensor.
To perform real tasks in code a Asset consists of an graph of Ops.
An Op is a function that performs a task and can be used to split the code into reusable components.
Dagster has a well-built web interface to monitor jobs and pipelines.
\subsubsection{Label-Studio}
\subsubsection{Pytorch}
\subsubsection{NVTec}
\subsubsection{Imagenet}
\subsubsection{Anomalib}
% todo maybe remove?
\subsubsection{Muffin vs chihuahua}
Muffin vs chihuahua is a free dataset available on Kaggle.
It consists of $\sim$1500 images of muffins and chihuahuas.
\subsection{Methods}\label{subsec:methods}