add abstract, finish the alternatvie methods and fix some todos and improve sources
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 21s

This commit is contained in:
2025-01-14 19:22:15 +01:00
parent 7c54e11238
commit 49d5e97417
6 changed files with 127 additions and 21 deletions

View File

@ -73,24 +73,23 @@ So many more defect classes are already an indication that a classification task
=== Few-Shot Learning
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
So the model is prone to overfitting to the few training samples.
So the model is prone to overfitting to the few training samples.#cite(<parnami2022learningexamplessummaryapproaches>)
Typically a few-shot leaning task consists of a support and query set.
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
A common way to format a few-shot leaning problem is using n-way k-shot notation.
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.#cite(<snell2017prototypicalnetworksfewshotlearning>)#cite(<patchcorepaper>)
A classical example of how such a model might work is a prototypical network.
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.#cite(<snell2017prototypicalnetworksfewshotlearning>)
#figure(
image("rsc/prototype_fewshot_v3.png", width: 60%),
caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
) <prototypefewshot>
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
See #todo[link to this section]
#todo[proper source]
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical network.
See @resnet50impl.~#cite(<chowdhury2021fewshotimageclassificationjust>)
=== Generalisation from few samples
@ -373,7 +372,7 @@ Its use of frozen pre-trained feature extractors is key to avoiding overfitting
== Alternative Methods
There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
There are several alternative methods to few-shot learning as well as to anomaly detection which are not used in this bachelor thesis.
Either they performed worse on benchmarks compared to the used methods or they were released after my initial literature research.
=== SgVA-CLIP (Semantic-guided Visual Adapting CLIP)
@ -393,18 +392,58 @@ If the pre-trained model lacks relevant information for the task, SgVA-CLIP migh
This might be a no-go for anomaly detection tasks because the images in such tasks are often very task-specific and not covered by general pre-trained models.
Also, fine-tuning the model can require considerable computational resources, which might be a limitation in some cases.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
=== TRIDENT
=== TRIDENT (Transductive Decoupled Variational Inference for Few-Shot Classification)
// https://arxiv.org/pdf/2208.10559v1
// https://arxiv.org/abs/2208.10559v1
== SOT
TRIDENT, a variational infernce network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification.
By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>)
To further improve the discriminative performance of the model, it incorporates a transductive feature extraction module named AttFEX (Attention-based Feature Extraction).
This feature extractor dynamically aligns features from both the support and the query set, promoting task-specific embeddings.~#cite(<singh2022transductivedecoupledvariationalinference>)
This model is specifically designed for few-shot classification tasks but might also work well for anomaly detection.
Its ability to isolate critical features while droping irellevant context aligns with requirements needed for anomaly detection.
=== SOT (Self-Optimal-Transport Feature Transform)
// https://arxiv.org/pdf/2204.03065v1
// https://arxiv.org/abs/2204.03065v1
The Self-Optimal-Transport (SOT) Feature Transform is designed to enhance feature sets for tsks like matching, grouping or classification by re-embedding feature representations.
This transform processes features as a set instead of using them individually.
This creates context-aware representations.
SOT can catch direct as well as indirect similarities between features which makes it suitable for tasks like few-shot learning or clustering.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
SOT uses a transport plan matrix derived from optimal transport theory to redefine feature relations.
This includes calculating pairwaise similarities (e.g. cosine similarities) between features and solving a min-cost max-flow problem to find an optimal match between features.
This results in an doubly stochastic matrix where each row represents the re-embedding of the corresponding feature in context with others.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
The transform features parameterless-ness, which makes it easy to integrate into existing machine-learning pipelines.
It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT.
SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
The improvements of SOT over traditional feature transforms dpeend on the used backbone network and the task.
But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
// anomaly detect
== GLASS
=== GLASS (Global and Local Anomaly co-Synthesis Strategy)
// https://arxiv.org/pdf/2407.09359v1
// https://arxiv.org/abs/2407.09359v1
GLASS (Global and Local Anomaly co-Synthesis Strategy) is a anomaly detection method for industrial applications.
It is a unified network which uses two different strategies to detect anomalies which are then combined.
The first one is Global Anomaly Synthesis (GAS), it operates on the feature level.
It uses a gaussian noise, guided by gradient ascent and constrained by truncated projection to generate anomalies close to the distribution for the normal features.
This helps the detection of weak defects.
The second strategy is Local Anomaly Synthesis (LAS), it operates on the image level.
This strategy overlays textures onto normal images using masks derived from noise patterns.
LAS creates strong anomalies which are further away from the normal sample distribution.
This adds diversity to the synthesized anomalies.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
GLASS combines GAS and LAS to improve anomaly detection and localization by synthesizing anomalies near and far from the normal distribution.
Experiments show that GLASS is very effective and outperforms some state-of-the-art methods on the MVTec AD dataset such as PatchCore in some cases.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
//=== HETMM (Hard-normal Example-aware Template Mutual Matching)
// https://arxiv.org/pdf/2303.16191v5
// https://arxiv.org/abs/2303.16191v5