add abstract, finish the alternatvie methods and fix some todos and improve sources
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 21s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 21s
This commit is contained in:
@ -73,24 +73,23 @@ So many more defect classes are already an indication that a classification task
|
||||
=== Few-Shot Learning
|
||||
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
|
||||
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
|
||||
So the model is prone to overfitting to the few training samples.
|
||||
So the model is prone to overfitting to the few training samples.#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||
|
||||
Typically a few-shot leaning task consists of a support and query set.
|
||||
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
|
||||
A common way to format a few-shot leaning problem is using n-way k-shot notation.
|
||||
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.
|
||||
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.#cite(<snell2017prototypicalnetworksfewshotlearning>)#cite(<patchcorepaper>)
|
||||
|
||||
A classical example of how such a model might work is a prototypical network.
|
||||
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.
|
||||
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.#cite(<snell2017prototypicalnetworksfewshotlearning>)
|
||||
|
||||
#figure(
|
||||
image("rsc/prototype_fewshot_v3.png", width: 60%),
|
||||
caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
|
||||
) <prototypefewshot>
|
||||
|
||||
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
|
||||
See #todo[link to this section]
|
||||
#todo[proper source]
|
||||
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical network.
|
||||
See @resnet50impl.~#cite(<chowdhury2021fewshotimageclassificationjust>)
|
||||
|
||||
=== Generalisation from few samples
|
||||
|
||||
@ -373,7 +372,7 @@ Its use of frozen pre-trained feature extractors is key to avoiding overfitting
|
||||
|
||||
== Alternative Methods
|
||||
|
||||
There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
|
||||
There are several alternative methods to few-shot learning as well as to anomaly detection which are not used in this bachelor thesis.
|
||||
Either they performed worse on benchmarks compared to the used methods or they were released after my initial literature research.
|
||||
|
||||
=== SgVA-CLIP (Semantic-guided Visual Adapting CLIP)
|
||||
@ -393,18 +392,58 @@ If the pre-trained model lacks relevant information for the task, SgVA-CLIP migh
|
||||
This might be a no-go for anomaly detection tasks because the images in such tasks are often very task-specific and not covered by general pre-trained models.
|
||||
Also, fine-tuning the model can require considerable computational resources, which might be a limitation in some cases.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
|
||||
|
||||
=== TRIDENT
|
||||
=== TRIDENT (Transductive Decoupled Variational Inference for Few-Shot Classification)
|
||||
// https://arxiv.org/pdf/2208.10559v1
|
||||
// https://arxiv.org/abs/2208.10559v1
|
||||
|
||||
== SOT
|
||||
TRIDENT, a variational infernce network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
|
||||
Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification.
|
||||
By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>)
|
||||
|
||||
To further improve the discriminative performance of the model, it incorporates a transductive feature extraction module named AttFEX (Attention-based Feature Extraction).
|
||||
This feature extractor dynamically aligns features from both the support and the query set, promoting task-specific embeddings.~#cite(<singh2022transductivedecoupledvariationalinference>)
|
||||
|
||||
This model is specifically designed for few-shot classification tasks but might also work well for anomaly detection.
|
||||
Its ability to isolate critical features while droping irellevant context aligns with requirements needed for anomaly detection.
|
||||
|
||||
=== SOT (Self-Optimal-Transport Feature Transform)
|
||||
// https://arxiv.org/pdf/2204.03065v1
|
||||
// https://arxiv.org/abs/2204.03065v1
|
||||
|
||||
The Self-Optimal-Transport (SOT) Feature Transform is designed to enhance feature sets for tsks like matching, grouping or classification by re-embedding feature representations.
|
||||
This transform processes features as a set instead of using them individually.
|
||||
This creates context-aware representations.
|
||||
SOT can catch direct as well as indirect similarities between features which makes it suitable for tasks like few-shot learning or clustering.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
SOT uses a transport plan matrix derived from optimal transport theory to redefine feature relations.
|
||||
This includes calculating pairwaise similarities (e.g. cosine similarities) between features and solving a min-cost max-flow problem to find an optimal match between features.
|
||||
This results in an doubly stochastic matrix where each row represents the re-embedding of the corresponding feature in context with others.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
The transform features parameterless-ness, which makes it easy to integrate into existing machine-learning pipelines.
|
||||
It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT.
|
||||
SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
The improvements of SOT over traditional feature transforms dpeend on the used backbone network and the task.
|
||||
But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
// anomaly detect
|
||||
== GLASS
|
||||
=== GLASS (Global and Local Anomaly co-Synthesis Strategy)
|
||||
// https://arxiv.org/pdf/2407.09359v1
|
||||
// https://arxiv.org/abs/2407.09359v1
|
||||
|
||||
GLASS (Global and Local Anomaly co-Synthesis Strategy) is a anomaly detection method for industrial applications.
|
||||
It is a unified network which uses two different strategies to detect anomalies which are then combined.
|
||||
The first one is Global Anomaly Synthesis (GAS), it operates on the feature level.
|
||||
It uses a gaussian noise, guided by gradient ascent and constrained by truncated projection to generate anomalies close to the distribution for the normal features.
|
||||
This helps the detection of weak defects.
|
||||
The second strategy is Local Anomaly Synthesis (LAS), it operates on the image level.
|
||||
This strategy overlays textures onto normal images using masks derived from noise patterns.
|
||||
LAS creates strong anomalies which are further away from the normal sample distribution.
|
||||
This adds diversity to the synthesized anomalies.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
|
||||
|
||||
GLASS combines GAS and LAS to improve anomaly detection and localization by synthesizing anomalies near and far from the normal distribution.
|
||||
Experiments show that GLASS is very effective and outperforms some state-of-the-art methods on the MVTec AD dataset such as PatchCore in some cases.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
|
||||
|
||||
//=== HETMM (Hard-normal Example-aware Template Mutual Matching)
|
||||
// https://arxiv.org/pdf/2303.16191v5
|
||||
// https://arxiv.org/abs/2303.16191v5
|
||||
|
Reference in New Issue
Block a user