add abstract, finish the alternatvie methods and fix some todos and improve sources
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 21s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 21s
This commit is contained in:
parent
7c54e11238
commit
49d5e97417
@ -64,7 +64,7 @@ Which is an result that is unexpected (since one can think more samples perform
|
|||||||
Clearly all four graphs show that the performance decreases with an increasing number of good samples.
|
Clearly all four graphs show that the performance decreases with an increasing number of good samples.
|
||||||
So the conclusion is that the Few-Shot learner should always be trained with as balanced classes as possible.
|
So the conclusion is that the Few-Shot learner should always be trained with as balanced classes as possible.
|
||||||
|
|
||||||
== How does the 3 (ResNet, CAML, pmf) methods perform in only detecting the anomaly class?
|
== How does the 3 (ResNet, CAML, P>M>F) methods perform in only detecting the anomaly class?
|
||||||
_How much does the performance improve if only detecting an anomaly or not?
|
_How much does the performance improve if only detecting an anomaly or not?
|
||||||
How does it compare to PatchCore and EfficientAD#todo[Maybe remove comparion?]?_
|
How does it compare to PatchCore and EfficientAD#todo[Maybe remove comparion?]?_
|
||||||
|
|
||||||
|
@ -7,15 +7,19 @@
|
|||||||
The three methods described (ResNet50, CAML, P>M>F) were implemented in a Jupyter notebook and compared to each other.
|
The three methods described (ResNet50, CAML, P>M>F) were implemented in a Jupyter notebook and compared to each other.
|
||||||
|
|
||||||
== Experiments <experiments>
|
== Experiments <experiments>
|
||||||
For all of the three methods we test the following use-cases:#todo[maybe write more to each test]
|
For all of the three methods we test the following use-cases:
|
||||||
- Detection of anomaly class (1,3,5 shots)
|
- Detection of anomaly class (1,3,5 shots)
|
||||||
|
- Every faulty class and the good class is detected.
|
||||||
- 2 Way classification (1,3,5 shots)
|
- 2 Way classification (1,3,5 shots)
|
||||||
|
- Only faulty or not faulty is detected. All the samples of the faulty classes are treated as a single class.
|
||||||
- Detect only anomaly classes (1,3,5 shots)
|
- Detect only anomaly classes (1,3,5 shots)
|
||||||
|
- Similar to the first test but without the good class. Only faulty classes are detected.
|
||||||
- Inbalanced 2 Way classification (5,10,15,30 good shots, 5 bad shots)
|
- Inbalanced 2 Way classification (5,10,15,30 good shots, 5 bad shots)
|
||||||
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)
|
- Similar to the 2 way classification but with an inbalanced number of good shots.
|
||||||
|
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)#todo[Avoid bullet points and write flow text?]
|
||||||
Those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
|
- Detect only the faulty classes without the good classed with an inbalanced number of shots.
|
||||||
|
|
||||||
|
All those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
|
||||||
|
|
||||||
== Experiment Setup
|
== Experiment Setup
|
||||||
All the experiments were done on the bottle and cable classes of the MVTEC AD dataset.
|
All the experiments were done on the bottle and cable classes of the MVTEC AD dataset.
|
||||||
@ -23,20 +27,21 @@ The correspoinding number of shots were randomly selected from the dataset.
|
|||||||
The rest of the images was used to test the model and measure the accuracy.
|
The rest of the images was used to test the model and measure the accuracy.
|
||||||
#todo[Maybe add real number of samples per classes]
|
#todo[Maybe add real number of samples per classes]
|
||||||
|
|
||||||
== ResNet50
|
== ResNet50 <resnet50impl>
|
||||||
=== Approach
|
=== Approach
|
||||||
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
|
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
|
||||||
From both the support and query set the features are extracted to get a downprojected representation of the images.
|
From both the support and query set the features are extracted to get a downprojected representation of the images.
|
||||||
The support set embeddings are compared to the query set embeddings.
|
The support set embeddings are compared to the query set embeddings.
|
||||||
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
|
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
|
||||||
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
|
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
|
||||||
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning.
|
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
|
||||||
|
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
|
||||||
|
|
||||||
In this bachelor thesis a pre-trained ResNet50 (IMAGENET1K_V2) pytorch model was used.
|
In this bachelor thesis a pre-trained ResNet50 (IMAGENET1K_V2) pytorch model was used.
|
||||||
It is pretrained on the imagenet dataset and has 50 residual layers.
|
It is pretrained on the imagenet dataset and has 50 residual layers.
|
||||||
|
|
||||||
To get the embeddings the last layer of the model was removed and the output of the second last layer was used as embedding output.
|
To get the embeddings the last layer of the model was removed and the output of the second last layer was used as embedding output.
|
||||||
In the following diagram the ResNet50 architecture is visualized and the cut-point is marked.
|
In the following diagram the ResNet50 architecture is visualized and the cut-point is marked.~@chowdhury2021fewshotimageclassificationjust
|
||||||
|
|
||||||
#diagram(
|
#diagram(
|
||||||
spacing: (5mm, 5mm),
|
spacing: (5mm, 5mm),
|
||||||
|
@ -5,12 +5,13 @@
|
|||||||
Anomaly detection has especially in the industrial and automotive field essential importance.
|
Anomaly detection has especially in the industrial and automotive field essential importance.
|
||||||
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
|
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
|
||||||
Machine learning helped the field to advance a lot in the past.
|
Machine learning helped the field to advance a lot in the past.
|
||||||
Most of the time the error rate is sub $.1%$ and therefore plenty of good data is available and the data is heavily unbalaned.
|
Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available.
|
||||||
|
So the train data is heavily unbalaned.#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||||
|
|
||||||
PatchCore and EfficientAD are state of the art algorithms trained only on good data and then detect anomalies within unseen (but similar) data.
|
PatchCore and EfficientAD are state of the art algorithms trained only on good data and then detect anomalies within unseen (but similar) data.
|
||||||
One of their problems is the need of lots of training data and time to train.
|
One of their problems is the need of lots of training data and time to train.
|
||||||
Moreover a slight change of the camera position or the lighting conditions can lead to a complete retraining of the model.
|
Moreover a slight change of the camera position or the lighting conditions can lead to a complete retraining of the model.
|
||||||
Few-Shot learning might be a suitable alternative with hugely lowered train times and fast adaption to new conditions.
|
Few-Shot learning might be a suitable alternative with hugely lowered train times and fast adaption to new conditions.~#cite(<efficientADpaper>)#cite(<patchcorepaper>)#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||||
|
|
||||||
In this thesis the performance of 3 Few-Shot learning algorithms will be compared in the field of anomaly detection.
|
In this thesis the performance of 3 Few-Shot learning algorithms will be compared in the field of anomaly detection.
|
||||||
Moreover, few-shot learning might be able not only to detect anomalies but also to detect the anomaly class.
|
Moreover, few-shot learning might be able not only to detect anomalies but also to detect the anomaly class.
|
||||||
|
13
main.typ
13
main.typ
@ -52,7 +52,18 @@
|
|||||||
place-of-submission: "Linz",
|
place-of-submission: "Linz",
|
||||||
title: "Few shot learning for anomaly detection",
|
title: "Few shot learning for anomaly detection",
|
||||||
abstract-en: [//max. 250 words
|
abstract-en: [//max. 250 words
|
||||||
#lorem(200) ],
|
This thesis explores the application of Few-Shot Learning (FSL) in anomaly detection, a critical area in industrial and automotive domains requiring robust and efficient algorithms for identifying defects.
|
||||||
|
Traditional methods, such as PatchCore and EfficientAD, achieve high accuracy but often demand extensive training data and are sensitive to environmental changes, necessitating frequent retraining.
|
||||||
|
FSL offers a promising alternative by enabling models to generalize effectively from minimal samples, thus reducing training time and adaptation overhead.
|
||||||
|
|
||||||
|
The study evaluates three FSL methods—ResNet50, P>M>F, and CAML—using the MVTec AD dataset.
|
||||||
|
Experiments focus on tasks such as anomaly detection, class imbalance handling, and comparison of distance metrics.
|
||||||
|
Results indicate that while FSL methods trail behind state-of-the-art algorithms in detecting anomalies, they excel in classifying anomaly types, showcasing potential in scenarios requiring detailed defect identification.
|
||||||
|
Among the tested approaches, P>M>F emerged as the most robust, demonstrating superior accuracy across various settings.
|
||||||
|
|
||||||
|
This research underscores the limitations and niche applicability of FSL in anomaly detection, advocating its integration with established algorithms for enhanced performance.
|
||||||
|
Future work should address the scalability and domain-specific adaptability of FSL techniques to broaden their utility in industrial applications.
|
||||||
|
],
|
||||||
abstract-de: none,// or specify the abbstract_de in a container []
|
abstract-de: none,// or specify the abbstract_de in a container []
|
||||||
acknowledgements: none,//acknowledgements: none // if you are self-made
|
acknowledgements: none,//acknowledgements: none // if you are self-made
|
||||||
show-title-in-header: false,
|
show-title-in-header: false,
|
||||||
|
@ -73,24 +73,23 @@ So many more defect classes are already an indication that a classification task
|
|||||||
=== Few-Shot Learning
|
=== Few-Shot Learning
|
||||||
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
|
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
|
||||||
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
|
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
|
||||||
So the model is prone to overfitting to the few training samples.
|
So the model is prone to overfitting to the few training samples.#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||||
|
|
||||||
Typically a few-shot leaning task consists of a support and query set.
|
Typically a few-shot leaning task consists of a support and query set.
|
||||||
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
|
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
|
||||||
A common way to format a few-shot leaning problem is using n-way k-shot notation.
|
A common way to format a few-shot leaning problem is using n-way k-shot notation.
|
||||||
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.
|
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.#cite(<snell2017prototypicalnetworksfewshotlearning>)#cite(<patchcorepaper>)
|
||||||
|
|
||||||
A classical example of how such a model might work is a prototypical network.
|
A classical example of how such a model might work is a prototypical network.
|
||||||
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.
|
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.#cite(<snell2017prototypicalnetworksfewshotlearning>)
|
||||||
|
|
||||||
#figure(
|
#figure(
|
||||||
image("rsc/prototype_fewshot_v3.png", width: 60%),
|
image("rsc/prototype_fewshot_v3.png", width: 60%),
|
||||||
caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
|
caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
|
||||||
) <prototypefewshot>
|
) <prototypefewshot>
|
||||||
|
|
||||||
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
|
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical network.
|
||||||
See #todo[link to this section]
|
See @resnet50impl.~#cite(<chowdhury2021fewshotimageclassificationjust>)
|
||||||
#todo[proper source]
|
|
||||||
|
|
||||||
=== Generalisation from few samples
|
=== Generalisation from few samples
|
||||||
|
|
||||||
@ -373,7 +372,7 @@ Its use of frozen pre-trained feature extractors is key to avoiding overfitting
|
|||||||
|
|
||||||
== Alternative Methods
|
== Alternative Methods
|
||||||
|
|
||||||
There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
|
There are several alternative methods to few-shot learning as well as to anomaly detection which are not used in this bachelor thesis.
|
||||||
Either they performed worse on benchmarks compared to the used methods or they were released after my initial literature research.
|
Either they performed worse on benchmarks compared to the used methods or they were released after my initial literature research.
|
||||||
|
|
||||||
=== SgVA-CLIP (Semantic-guided Visual Adapting CLIP)
|
=== SgVA-CLIP (Semantic-guided Visual Adapting CLIP)
|
||||||
@ -393,18 +392,58 @@ If the pre-trained model lacks relevant information for the task, SgVA-CLIP migh
|
|||||||
This might be a no-go for anomaly detection tasks because the images in such tasks are often very task-specific and not covered by general pre-trained models.
|
This might be a no-go for anomaly detection tasks because the images in such tasks are often very task-specific and not covered by general pre-trained models.
|
||||||
Also, fine-tuning the model can require considerable computational resources, which might be a limitation in some cases.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
|
Also, fine-tuning the model can require considerable computational resources, which might be a limitation in some cases.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
|
||||||
|
|
||||||
=== TRIDENT
|
=== TRIDENT (Transductive Decoupled Variational Inference for Few-Shot Classification)
|
||||||
// https://arxiv.org/pdf/2208.10559v1
|
// https://arxiv.org/pdf/2208.10559v1
|
||||||
// https://arxiv.org/abs/2208.10559v1
|
// https://arxiv.org/abs/2208.10559v1
|
||||||
|
|
||||||
== SOT
|
TRIDENT, a variational infernce network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
|
||||||
|
Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification.
|
||||||
|
By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>)
|
||||||
|
|
||||||
|
To further improve the discriminative performance of the model, it incorporates a transductive feature extraction module named AttFEX (Attention-based Feature Extraction).
|
||||||
|
This feature extractor dynamically aligns features from both the support and the query set, promoting task-specific embeddings.~#cite(<singh2022transductivedecoupledvariationalinference>)
|
||||||
|
|
||||||
|
This model is specifically designed for few-shot classification tasks but might also work well for anomaly detection.
|
||||||
|
Its ability to isolate critical features while droping irellevant context aligns with requirements needed for anomaly detection.
|
||||||
|
|
||||||
|
=== SOT (Self-Optimal-Transport Feature Transform)
|
||||||
// https://arxiv.org/pdf/2204.03065v1
|
// https://arxiv.org/pdf/2204.03065v1
|
||||||
// https://arxiv.org/abs/2204.03065v1
|
// https://arxiv.org/abs/2204.03065v1
|
||||||
|
|
||||||
|
The Self-Optimal-Transport (SOT) Feature Transform is designed to enhance feature sets for tsks like matching, grouping or classification by re-embedding feature representations.
|
||||||
|
This transform processes features as a set instead of using them individually.
|
||||||
|
This creates context-aware representations.
|
||||||
|
SOT can catch direct as well as indirect similarities between features which makes it suitable for tasks like few-shot learning or clustering.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||||
|
|
||||||
|
SOT uses a transport plan matrix derived from optimal transport theory to redefine feature relations.
|
||||||
|
This includes calculating pairwaise similarities (e.g. cosine similarities) between features and solving a min-cost max-flow problem to find an optimal match between features.
|
||||||
|
This results in an doubly stochastic matrix where each row represents the re-embedding of the corresponding feature in context with others.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||||
|
|
||||||
|
The transform features parameterless-ness, which makes it easy to integrate into existing machine-learning pipelines.
|
||||||
|
It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT.
|
||||||
|
SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||||
|
|
||||||
|
The improvements of SOT over traditional feature transforms dpeend on the used backbone network and the task.
|
||||||
|
But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||||
|
|
||||||
// anomaly detect
|
// anomaly detect
|
||||||
== GLASS
|
=== GLASS (Global and Local Anomaly co-Synthesis Strategy)
|
||||||
// https://arxiv.org/pdf/2407.09359v1
|
// https://arxiv.org/pdf/2407.09359v1
|
||||||
// https://arxiv.org/abs/2407.09359v1
|
// https://arxiv.org/abs/2407.09359v1
|
||||||
|
|
||||||
|
GLASS (Global and Local Anomaly co-Synthesis Strategy) is a anomaly detection method for industrial applications.
|
||||||
|
It is a unified network which uses two different strategies to detect anomalies which are then combined.
|
||||||
|
The first one is Global Anomaly Synthesis (GAS), it operates on the feature level.
|
||||||
|
It uses a gaussian noise, guided by gradient ascent and constrained by truncated projection to generate anomalies close to the distribution for the normal features.
|
||||||
|
This helps the detection of weak defects.
|
||||||
|
The second strategy is Local Anomaly Synthesis (LAS), it operates on the image level.
|
||||||
|
This strategy overlays textures onto normal images using masks derived from noise patterns.
|
||||||
|
LAS creates strong anomalies which are further away from the normal sample distribution.
|
||||||
|
This adds diversity to the synthesized anomalies.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
|
||||||
|
|
||||||
|
GLASS combines GAS and LAS to improve anomaly detection and localization by synthesizing anomalies near and far from the normal distribution.
|
||||||
|
Experiments show that GLASS is very effective and outperforms some state-of-the-art methods on the MVTec AD dataset such as PatchCore in some cases.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
|
||||||
|
|
||||||
|
//=== HETMM (Hard-normal Example-aware Template Mutual Matching)
|
||||||
// https://arxiv.org/pdf/2303.16191v5
|
// https://arxiv.org/pdf/2303.16191v5
|
||||||
// https://arxiv.org/abs/2303.16191v5
|
// https://arxiv.org/abs/2303.16191v5
|
||||||
|
50
sources.bib
50
sources.bib
@ -147,3 +147,53 @@
|
|||||||
primaryClass={cs.CV},
|
primaryClass={cs.CV},
|
||||||
url={https://arxiv.org/abs/2211.16191},
|
url={https://arxiv.org/abs/2211.16191},
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@misc{singh2022transductivedecoupledvariationalinference,
|
||||||
|
title={Transductive Decoupled Variational Inference for Few-Shot Classification},
|
||||||
|
author={Anuj Singh and Hadi Jamali-Rad},
|
||||||
|
year={2022},
|
||||||
|
eprint={2208.10559},
|
||||||
|
archivePrefix={arXiv},
|
||||||
|
primaryClass={cs.CV},
|
||||||
|
url={https://arxiv.org/abs/2208.10559},
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{chen2024unifiedanomalysynthesisstrategy,
|
||||||
|
title={A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization},
|
||||||
|
author={Qiyu Chen and Huiyuan Luo and Chengkan Lv and Zhengtao Zhang},
|
||||||
|
year={2024},
|
||||||
|
eprint={2407.09359},
|
||||||
|
archivePrefix={arXiv},
|
||||||
|
primaryClass={cs.CV},
|
||||||
|
url={https://arxiv.org/abs/2407.09359},
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{shalam2022selfoptimaltransportfeaturetransform,
|
||||||
|
title={The Self-Optimal-Transport Feature Transform},
|
||||||
|
author={Daniel Shalam and Simon Korman},
|
||||||
|
year={2022},
|
||||||
|
eprint={2204.03065},
|
||||||
|
archivePrefix={arXiv},
|
||||||
|
primaryClass={cs.CV},
|
||||||
|
url={https://arxiv.org/abs/2204.03065},
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{parnami2022learningexamplessummaryapproaches,
|
||||||
|
title={Learning from Few Examples: A Summary of Approaches to Few-Shot Learning},
|
||||||
|
author={Archit Parnami and Minwoo Lee},
|
||||||
|
year={2022},
|
||||||
|
eprint={2203.04291},
|
||||||
|
archivePrefix={arXiv},
|
||||||
|
primaryClass={cs.LG},
|
||||||
|
url={https://arxiv.org/abs/2203.04291},
|
||||||
|
}
|
||||||
|
|
||||||
|
@misc{chowdhury2021fewshotimageclassificationjust,
|
||||||
|
title={Few-shot Image Classification: Just Use a Library of Pre-trained Feature Extractors and a Simple Classifier},
|
||||||
|
author={Arkabandhu Chowdhury and Mingchao Jiang and Swarat Chaudhuri and Chris Jermaine},
|
||||||
|
year={2021},
|
||||||
|
eprint={2101.00562},
|
||||||
|
archivePrefix={arXiv},
|
||||||
|
primaryClass={cs.CV},
|
||||||
|
url={https://arxiv.org/abs/2101.00562},
|
||||||
|
}
|
||||||
|
Loading…
x
Reference in New Issue
Block a user