add abstract, finish the alternatvie methods and fix some todos and improve sources
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 21s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 21s
This commit is contained in:
parent
7c54e11238
commit
49d5e97417
@ -64,7 +64,7 @@ Which is an result that is unexpected (since one can think more samples perform
|
||||
Clearly all four graphs show that the performance decreases with an increasing number of good samples.
|
||||
So the conclusion is that the Few-Shot learner should always be trained with as balanced classes as possible.
|
||||
|
||||
== How does the 3 (ResNet, CAML, pmf) methods perform in only detecting the anomaly class?
|
||||
== How does the 3 (ResNet, CAML, P>M>F) methods perform in only detecting the anomaly class?
|
||||
_How much does the performance improve if only detecting an anomaly or not?
|
||||
How does it compare to PatchCore and EfficientAD#todo[Maybe remove comparion?]?_
|
||||
|
||||
|
@ -7,15 +7,19 @@
|
||||
The three methods described (ResNet50, CAML, P>M>F) were implemented in a Jupyter notebook and compared to each other.
|
||||
|
||||
== Experiments <experiments>
|
||||
For all of the three methods we test the following use-cases:#todo[maybe write more to each test]
|
||||
For all of the three methods we test the following use-cases:
|
||||
- Detection of anomaly class (1,3,5 shots)
|
||||
- Every faulty class and the good class is detected.
|
||||
- 2 Way classification (1,3,5 shots)
|
||||
- Only faulty or not faulty is detected. All the samples of the faulty classes are treated as a single class.
|
||||
- Detect only anomaly classes (1,3,5 shots)
|
||||
- Similar to the first test but without the good class. Only faulty classes are detected.
|
||||
- Inbalanced 2 Way classification (5,10,15,30 good shots, 5 bad shots)
|
||||
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)
|
||||
|
||||
Those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
|
||||
- Similar to the 2 way classification but with an inbalanced number of good shots.
|
||||
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)#todo[Avoid bullet points and write flow text?]
|
||||
- Detect only the faulty classes without the good classed with an inbalanced number of shots.
|
||||
|
||||
All those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
|
||||
|
||||
== Experiment Setup
|
||||
All the experiments were done on the bottle and cable classes of the MVTEC AD dataset.
|
||||
@ -23,20 +27,21 @@ The correspoinding number of shots were randomly selected from the dataset.
|
||||
The rest of the images was used to test the model and measure the accuracy.
|
||||
#todo[Maybe add real number of samples per classes]
|
||||
|
||||
== ResNet50
|
||||
== ResNet50 <resnet50impl>
|
||||
=== Approach
|
||||
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
|
||||
From both the support and query set the features are extracted to get a downprojected representation of the images.
|
||||
The support set embeddings are compared to the query set embeddings.
|
||||
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
|
||||
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
|
||||
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning.
|
||||
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
|
||||
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
|
||||
|
||||
In this bachelor thesis a pre-trained ResNet50 (IMAGENET1K_V2) pytorch model was used.
|
||||
It is pretrained on the imagenet dataset and has 50 residual layers.
|
||||
|
||||
To get the embeddings the last layer of the model was removed and the output of the second last layer was used as embedding output.
|
||||
In the following diagram the ResNet50 architecture is visualized and the cut-point is marked.
|
||||
In the following diagram the ResNet50 architecture is visualized and the cut-point is marked.~@chowdhury2021fewshotimageclassificationjust
|
||||
|
||||
#diagram(
|
||||
spacing: (5mm, 5mm),
|
||||
|
@ -5,12 +5,13 @@
|
||||
Anomaly detection has especially in the industrial and automotive field essential importance.
|
||||
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
|
||||
Machine learning helped the field to advance a lot in the past.
|
||||
Most of the time the error rate is sub $.1%$ and therefore plenty of good data is available and the data is heavily unbalaned.
|
||||
Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available.
|
||||
So the train data is heavily unbalaned.#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||
|
||||
PatchCore and EfficientAD are state of the art algorithms trained only on good data and then detect anomalies within unseen (but similar) data.
|
||||
One of their problems is the need of lots of training data and time to train.
|
||||
Moreover a slight change of the camera position or the lighting conditions can lead to a complete retraining of the model.
|
||||
Few-Shot learning might be a suitable alternative with hugely lowered train times and fast adaption to new conditions.
|
||||
Few-Shot learning might be a suitable alternative with hugely lowered train times and fast adaption to new conditions.~#cite(<efficientADpaper>)#cite(<patchcorepaper>)#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||
|
||||
In this thesis the performance of 3 Few-Shot learning algorithms will be compared in the field of anomaly detection.
|
||||
Moreover, few-shot learning might be able not only to detect anomalies but also to detect the anomaly class.
|
||||
|
13
main.typ
13
main.typ
@ -52,7 +52,18 @@
|
||||
place-of-submission: "Linz",
|
||||
title: "Few shot learning for anomaly detection",
|
||||
abstract-en: [//max. 250 words
|
||||
#lorem(200) ],
|
||||
This thesis explores the application of Few-Shot Learning (FSL) in anomaly detection, a critical area in industrial and automotive domains requiring robust and efficient algorithms for identifying defects.
|
||||
Traditional methods, such as PatchCore and EfficientAD, achieve high accuracy but often demand extensive training data and are sensitive to environmental changes, necessitating frequent retraining.
|
||||
FSL offers a promising alternative by enabling models to generalize effectively from minimal samples, thus reducing training time and adaptation overhead.
|
||||
|
||||
The study evaluates three FSL methods—ResNet50, P>M>F, and CAML—using the MVTec AD dataset.
|
||||
Experiments focus on tasks such as anomaly detection, class imbalance handling, and comparison of distance metrics.
|
||||
Results indicate that while FSL methods trail behind state-of-the-art algorithms in detecting anomalies, they excel in classifying anomaly types, showcasing potential in scenarios requiring detailed defect identification.
|
||||
Among the tested approaches, P>M>F emerged as the most robust, demonstrating superior accuracy across various settings.
|
||||
|
||||
This research underscores the limitations and niche applicability of FSL in anomaly detection, advocating its integration with established algorithms for enhanced performance.
|
||||
Future work should address the scalability and domain-specific adaptability of FSL techniques to broaden their utility in industrial applications.
|
||||
],
|
||||
abstract-de: none,// or specify the abbstract_de in a container []
|
||||
acknowledgements: none,//acknowledgements: none // if you are self-made
|
||||
show-title-in-header: false,
|
||||
|
@ -73,24 +73,23 @@ So many more defect classes are already an indication that a classification task
|
||||
=== Few-Shot Learning
|
||||
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
|
||||
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
|
||||
So the model is prone to overfitting to the few training samples.
|
||||
So the model is prone to overfitting to the few training samples.#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||
|
||||
Typically a few-shot leaning task consists of a support and query set.
|
||||
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
|
||||
A common way to format a few-shot leaning problem is using n-way k-shot notation.
|
||||
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.
|
||||
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.#cite(<snell2017prototypicalnetworksfewshotlearning>)#cite(<patchcorepaper>)
|
||||
|
||||
A classical example of how such a model might work is a prototypical network.
|
||||
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.
|
||||
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.#cite(<snell2017prototypicalnetworksfewshotlearning>)
|
||||
|
||||
#figure(
|
||||
image("rsc/prototype_fewshot_v3.png", width: 60%),
|
||||
caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
|
||||
) <prototypefewshot>
|
||||
|
||||
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
|
||||
See #todo[link to this section]
|
||||
#todo[proper source]
|
||||
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical network.
|
||||
See @resnet50impl.~#cite(<chowdhury2021fewshotimageclassificationjust>)
|
||||
|
||||
=== Generalisation from few samples
|
||||
|
||||
@ -373,7 +372,7 @@ Its use of frozen pre-trained feature extractors is key to avoiding overfitting
|
||||
|
||||
== Alternative Methods
|
||||
|
||||
There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
|
||||
There are several alternative methods to few-shot learning as well as to anomaly detection which are not used in this bachelor thesis.
|
||||
Either they performed worse on benchmarks compared to the used methods or they were released after my initial literature research.
|
||||
|
||||
=== SgVA-CLIP (Semantic-guided Visual Adapting CLIP)
|
||||
@ -393,18 +392,58 @@ If the pre-trained model lacks relevant information for the task, SgVA-CLIP migh
|
||||
This might be a no-go for anomaly detection tasks because the images in such tasks are often very task-specific and not covered by general pre-trained models.
|
||||
Also, fine-tuning the model can require considerable computational resources, which might be a limitation in some cases.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
|
||||
|
||||
=== TRIDENT
|
||||
=== TRIDENT (Transductive Decoupled Variational Inference for Few-Shot Classification)
|
||||
// https://arxiv.org/pdf/2208.10559v1
|
||||
// https://arxiv.org/abs/2208.10559v1
|
||||
|
||||
== SOT
|
||||
TRIDENT, a variational infernce network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
|
||||
Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification.
|
||||
By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>)
|
||||
|
||||
To further improve the discriminative performance of the model, it incorporates a transductive feature extraction module named AttFEX (Attention-based Feature Extraction).
|
||||
This feature extractor dynamically aligns features from both the support and the query set, promoting task-specific embeddings.~#cite(<singh2022transductivedecoupledvariationalinference>)
|
||||
|
||||
This model is specifically designed for few-shot classification tasks but might also work well for anomaly detection.
|
||||
Its ability to isolate critical features while droping irellevant context aligns with requirements needed for anomaly detection.
|
||||
|
||||
=== SOT (Self-Optimal-Transport Feature Transform)
|
||||
// https://arxiv.org/pdf/2204.03065v1
|
||||
// https://arxiv.org/abs/2204.03065v1
|
||||
|
||||
The Self-Optimal-Transport (SOT) Feature Transform is designed to enhance feature sets for tsks like matching, grouping or classification by re-embedding feature representations.
|
||||
This transform processes features as a set instead of using them individually.
|
||||
This creates context-aware representations.
|
||||
SOT can catch direct as well as indirect similarities between features which makes it suitable for tasks like few-shot learning or clustering.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
SOT uses a transport plan matrix derived from optimal transport theory to redefine feature relations.
|
||||
This includes calculating pairwaise similarities (e.g. cosine similarities) between features and solving a min-cost max-flow problem to find an optimal match between features.
|
||||
This results in an doubly stochastic matrix where each row represents the re-embedding of the corresponding feature in context with others.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
The transform features parameterless-ness, which makes it easy to integrate into existing machine-learning pipelines.
|
||||
It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT.
|
||||
SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
The improvements of SOT over traditional feature transforms dpeend on the used backbone network and the task.
|
||||
But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
// anomaly detect
|
||||
== GLASS
|
||||
=== GLASS (Global and Local Anomaly co-Synthesis Strategy)
|
||||
// https://arxiv.org/pdf/2407.09359v1
|
||||
// https://arxiv.org/abs/2407.09359v1
|
||||
|
||||
GLASS (Global and Local Anomaly co-Synthesis Strategy) is a anomaly detection method for industrial applications.
|
||||
It is a unified network which uses two different strategies to detect anomalies which are then combined.
|
||||
The first one is Global Anomaly Synthesis (GAS), it operates on the feature level.
|
||||
It uses a gaussian noise, guided by gradient ascent and constrained by truncated projection to generate anomalies close to the distribution for the normal features.
|
||||
This helps the detection of weak defects.
|
||||
The second strategy is Local Anomaly Synthesis (LAS), it operates on the image level.
|
||||
This strategy overlays textures onto normal images using masks derived from noise patterns.
|
||||
LAS creates strong anomalies which are further away from the normal sample distribution.
|
||||
This adds diversity to the synthesized anomalies.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
|
||||
|
||||
GLASS combines GAS and LAS to improve anomaly detection and localization by synthesizing anomalies near and far from the normal distribution.
|
||||
Experiments show that GLASS is very effective and outperforms some state-of-the-art methods on the MVTec AD dataset such as PatchCore in some cases.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
|
||||
|
||||
//=== HETMM (Hard-normal Example-aware Template Mutual Matching)
|
||||
// https://arxiv.org/pdf/2303.16191v5
|
||||
// https://arxiv.org/abs/2303.16191v5
|
||||
|
50
sources.bib
50
sources.bib
@ -147,3 +147,53 @@
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2211.16191},
|
||||
}
|
||||
|
||||
@misc{singh2022transductivedecoupledvariationalinference,
|
||||
title={Transductive Decoupled Variational Inference for Few-Shot Classification},
|
||||
author={Anuj Singh and Hadi Jamali-Rad},
|
||||
year={2022},
|
||||
eprint={2208.10559},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2208.10559},
|
||||
}
|
||||
|
||||
@misc{chen2024unifiedanomalysynthesisstrategy,
|
||||
title={A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization},
|
||||
author={Qiyu Chen and Huiyuan Luo and Chengkan Lv and Zhengtao Zhang},
|
||||
year={2024},
|
||||
eprint={2407.09359},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2407.09359},
|
||||
}
|
||||
|
||||
@misc{shalam2022selfoptimaltransportfeaturetransform,
|
||||
title={The Self-Optimal-Transport Feature Transform},
|
||||
author={Daniel Shalam and Simon Korman},
|
||||
year={2022},
|
||||
eprint={2204.03065},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2204.03065},
|
||||
}
|
||||
|
||||
@misc{parnami2022learningexamplessummaryapproaches,
|
||||
title={Learning from Few Examples: A Summary of Approaches to Few-Shot Learning},
|
||||
author={Archit Parnami and Minwoo Lee},
|
||||
year={2022},
|
||||
eprint={2203.04291},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.LG},
|
||||
url={https://arxiv.org/abs/2203.04291},
|
||||
}
|
||||
|
||||
@misc{chowdhury2021fewshotimageclassificationjust,
|
||||
title={Few-shot Image Classification: Just Use a Library of Pre-trained Feature Extractors and a Simple Classifier},
|
||||
author={Arkabandhu Chowdhury and Mingchao Jiang and Swarat Chaudhuri and Chris Jermaine},
|
||||
year={2021},
|
||||
eprint={2101.00562},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2101.00562},
|
||||
}
|
||||
|
Loading…
x
Reference in New Issue
Block a user