Compare commits
8 Commits
dd1f28a89f
...
fixes
Author | SHA1 | Date | |
---|---|---|---|
71bdb0a207 | |||
8f28a8c387 | |||
a1b8d7d81a | |||
c5bd509f24 | |||
30d09a67d2 | |||
3e440e97f7 | |||
49d5e97417 | |||
7c54e11238 |
@ -6,14 +6,17 @@ The only benefit of Few-Shot learning is that it can be used in environments whe
|
||||
But this should not be the case in most scenarios.
|
||||
Most of the time plenty of good samples are available and in this case Patchcore or EfficientAD should perform great.
|
||||
|
||||
The only case where Few-Shot learning could be used is in a scenario where one wants to detect the anomaly class itself.
|
||||
Patchcore and EfficientAD can only detect if an anomaly is present or not but not what the anomaly is.
|
||||
The only case where Few-Shot learning could be used is in a scenarios where one wants to detect the anomaly class itself.
|
||||
Patchcore and EfficientAD can only detect if an anomaly is present or not but not what type of anomaly it actually is.
|
||||
So chaining a Few-Shot learner after Patchcore or EfficientAD could be a good idea to use the best of both worlds.
|
||||
|
||||
In most of the tests performed P>M>F performed the best.
|
||||
In most of the tests P>M>F performed the best.
|
||||
But also the simple ResNet50 method performed better than expected in most cases and can be considered if the computational resources are limited and if a simple architecture is enough.
|
||||
|
||||
== Outlook
|
||||
In the future when new Few-Shot learning methods evolve it could be interesting to test again how they perform in anomaly detection tasks.
|
||||
There might be a lack of research in the area where the classes to detect are very similar to each other
|
||||
and when building a few-shot learning algorithm tailored specifically for very similar classes this could boost the performance by a large margin.
|
||||
|
||||
It might be interesting to test the SOT method (see @SOT) with a ResNet50 feature extractor similar as proposed in this thesis but with SOT for embedding comparison.
|
||||
Moreover, TRIDENT (see @TRIDENT) could achive promising results in a anomaly detection scenario.
|
||||
|
@ -64,8 +64,8 @@ Which is an result that is unexpected (since one can think more samples perform
|
||||
Clearly all four graphs show that the performance decreases with an increasing number of good samples.
|
||||
So the conclusion is that the Few-Shot learner should always be trained with as balanced classes as possible.
|
||||
|
||||
== How does the 3 (ResNet, CAML, pmf) methods perform in only detecting the anomaly class?
|
||||
_How much does the performance improve if only detecting an anomaly or not?
|
||||
== How do the 3 (ResNet, CAML, P>M>F) methods perform in only detecting the anomaly class?
|
||||
_How much does the performance improve by only detecting the presence of an anomaly?
|
||||
How does it compare to PatchCore and EfficientAD#todo[Maybe remove comparion?]?_
|
||||
|
||||
@comparisonnormal shows graphs comparing the performance of the ResNet, CAML and P>M>F methods in detecting the anomaly class only including the good class as well as excluding the good class.
|
||||
|
@ -7,15 +7,19 @@
|
||||
The three methods described (ResNet50, CAML, P>M>F) were implemented in a Jupyter notebook and compared to each other.
|
||||
|
||||
== Experiments <experiments>
|
||||
For all of the three methods we test the following use-cases:#todo[maybe write more to each test]
|
||||
For all of the three methods we test the following use-cases:
|
||||
- Detection of anomaly class (1,3,5 shots)
|
||||
- Every faulty class and the good class is detected.
|
||||
- 2 Way classification (1,3,5 shots)
|
||||
- Only faulty or not faulty is detected. All the samples of the faulty classes are treated as a single class.
|
||||
- Detect only anomaly classes (1,3,5 shots)
|
||||
- Similar to the first test but without the good class. Only faulty classes are detected.
|
||||
- Inbalanced 2 Way classification (5,10,15,30 good shots, 5 bad shots)
|
||||
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)
|
||||
|
||||
Those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
|
||||
- Similar to the 2 way classification but with an inbalanced number of good shots.
|
||||
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)#todo[Avoid bullet points and write flow text?]
|
||||
- Detect only the faulty classes without the good classed with an inbalanced number of shots.
|
||||
|
||||
All those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
|
||||
|
||||
== Experiment Setup
|
||||
All the experiments were done on the bottle and cable classes of the MVTEC AD dataset.
|
||||
@ -23,20 +27,21 @@ The correspoinding number of shots were randomly selected from the dataset.
|
||||
The rest of the images was used to test the model and measure the accuracy.
|
||||
#todo[Maybe add real number of samples per classes]
|
||||
|
||||
== ResNet50
|
||||
== ResNet50 <resnet50impl>
|
||||
=== Approach
|
||||
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
|
||||
From both the support and query set the features are extracted to get a downprojected representation of the images.
|
||||
The support set embeddings are compared to the query set embeddings.
|
||||
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
|
||||
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
|
||||
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning.
|
||||
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
|
||||
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
|
||||
|
||||
In this bachelor thesis a pre-trained ResNet50 (IMAGENET1K_V2) pytorch model was used.
|
||||
It is pretrained on the imagenet dataset and has 50 residual layers.
|
||||
|
||||
To get the embeddings the last layer of the model was removed and the output of the second last layer was used as embedding output.
|
||||
In the following diagram the ResNet50 architecture is visualized and the cut-point is marked.
|
||||
In the following diagram the ResNet50 architecture is visualized and the cut-point is marked.~@chowdhury2021fewshotimageclassificationjust
|
||||
|
||||
#diagram(
|
||||
spacing: (5mm, 5mm),
|
||||
@ -146,7 +151,7 @@ In a real world scenario this should not be the case because the support set is
|
||||
=== Results
|
||||
The results of P>M>F look very promising and improve by a large margin over the ResNet50 method.
|
||||
In @pmfbottleperfa the model reached an accuracy of 79% with 5 shots / 4 way classification.
|
||||
The 2 way classification (faulty or not) performed even better and peaked at 94% accuracy with 5 shots.#todo[Add somehow that all classes are stacked]
|
||||
The 2 way classification (faulty or not) performed even better and peaked at 94% accuracy with 5 shots.
|
||||
|
||||
Similar to the ResNet50 method in @resnet50perf the tests with an inbalanced class distribution performed worse than with balanced classes.
|
||||
So it is clearly a bad idea to add more good shots to the support set.
|
||||
@ -178,6 +183,11 @@ So it is clearly a bad idea to add more good shots to the support set.
|
||||
== CAML
|
||||
=== Approach
|
||||
For the CAML implementation the pretrained model weights from the original paper were used.
|
||||
The non-causal sequence model (transformer) is pretrained with every class having the same number of shots.
|
||||
This brings the limitation that it can only process default few-shot learning tasks in the n-way k-shots fashion.
|
||||
Since it expects the input sequence to be distributed with the same number of shots per class.
|
||||
This is the reason why for this method the two imbalanced test cases couldn't be conducted.
|
||||
|
||||
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
|
||||
This feature extractor was already pretrained when used by the authors of the original paper.
|
||||
For the non-causal sequence model a transformer model was used
|
||||
|
@ -5,32 +5,32 @@
|
||||
Anomaly detection has especially in the industrial and automotive field essential importance.
|
||||
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
|
||||
Machine learning helped the field to advance a lot in the past.
|
||||
Most of the time the error rate is sub $.1%$ and therefore plenty of good data is available and the data is heavily unbalaned.
|
||||
Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available.
|
||||
So the train data is heavily unbalanced.~#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||
|
||||
PatchCore and EfficientAD are state of the art algorithms trained only on good data and then detect anomalies within unseen (but similar) data.
|
||||
One of their problems is the need of lots of training data and time to train.
|
||||
Moreover a slight change of the camera position or the lighting conditions can lead to a complete retraining of the model.
|
||||
Few-Shot learning might be a suitable alternative with hugely lowered train times and fast adaption to new conditions.
|
||||
Moreover a slight change of the camera position or the lighting conditions can lead to a mandatory complete retraining of the model.
|
||||
Few-Shot learning might be a suitable alternative with hugely lowered train times and fast adaption to new conditions.~#cite(<efficientADpaper>)#cite(<patchcorepaper>)#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||
|
||||
In this thesis the performance of 3 Few-Shot learning algorithms will be compared in the field of anomaly detection.
|
||||
In this thesis the performance of 3 Few-Shot learning algorithms (ResNet50, P>M>F, CAML) will be compared in the field of anomaly detection.
|
||||
Moreover, few-shot learning might be able not only to detect anomalies but also to detect the anomaly class.
|
||||
|
||||
== Research Questions <sectionresearchquestions>
|
||||
|
||||
=== Is Few-Shot learning a suitable fit for anomaly detection?
|
||||
|
||||
Should Few-Shot learning be used for anomaly detection tasks?
|
||||
How does it compare to well established algorithms such as Patchcore or EfficientAD?
|
||||
_Should Few-Shot learning be used for anomaly detection tasks?
|
||||
How does it compare to well established algorithms such as Patchcore or EfficientAD?_
|
||||
|
||||
=== How does disbalancing the Shot number affect performance?
|
||||
Does giving the Few-Shot learner more good than bad samples improve the model performance?
|
||||
_Does giving the Few-Shot learner more good than bad samples improve the model performance?_
|
||||
|
||||
=== How does the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?
|
||||
How much does the performance improve if only detecting an anomaly or not?
|
||||
How does it compare to PatchCore and EfficientAD?
|
||||
=== How do the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?
|
||||
_How much does the performance improve by only detecting the presence of an anomaly?
|
||||
How does it compare to PatchCore and EfficientAD?_
|
||||
|
||||
#if inwriting [
|
||||
=== Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?
|
||||
=== _Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?_
|
||||
// I've tried different distance measures $->$ but results are pretty much the same.
|
||||
]
|
||||
|
||||
@ -45,7 +45,7 @@ It outlines the experimental setup, including the use of Jupyter Notebook for pr
|
||||
|
||||
The experimental outcomes are presented in @sectionexperimentalresults.
|
||||
This section addresses the research questions posed in @sectionresearchquestions, examining the suitability of Few-Shot Learning for anomaly detection tasks, the impact of class imbalance on model performance, and the comparative effectiveness of the three selected methods.
|
||||
Additional experiments explore the differences between Euclidean distance and Cosine similarity when using ResNet as a feature extractor.#todo[Maybe remove this]
|
||||
//Additional experiments explore the differences between Euclidean distance and Cosine similarity when using ResNet as a feature extractor.#todo[Maybe remove this]
|
||||
|
||||
Finally, @sectionconclusionandoutlook, summarizes the key findings of this study.
|
||||
It reflects on the implications of the results for the field of anomaly detection and proposes directions for future research that could address the limitations and enhance the applicability of Few-Shot Learning approaches in this domain.
|
||||
|
18
main.typ
18
main.typ
@ -44,7 +44,7 @@
|
||||
thesis-type: "Bachelor",
|
||||
degree: "Bachelor of Science",
|
||||
program: "Artifical Intelligence",
|
||||
supervisor: "Professor Scharinger Josef",
|
||||
supervisor: "Josef Scharinger, a.Univ.-Prof, Dr.",
|
||||
advisors: (), // singular advisor like this: ("Dr. Felix Pawsworth",) and no supervisor: ""
|
||||
department: "Institute of Computational Perception",
|
||||
author: "Lukas Heiligenbrunner",
|
||||
@ -52,7 +52,19 @@
|
||||
place-of-submission: "Linz",
|
||||
title: "Few shot learning for anomaly detection",
|
||||
abstract-en: [//max. 250 words
|
||||
#lorem(200) ],
|
||||
This thesis explores the application of Few-Shot Learning (FSL) in anomaly detection, a critical area in industrial and automotive domains requiring robust and efficient algorithms for identifying defects.
|
||||
Traditional methods, such as PatchCore and EfficientAD, achieve high accuracy but often demand extensive training data and are sensitive to environmental changes, necessitating frequent retraining.
|
||||
FSL offers a promising alternative by enabling models to generalize effectively from minimal samples, thus reducing training time and adaptation overhead.
|
||||
|
||||
The study evaluates three FSL methods—ResNet50, P>M>F, and CAML—using the MVTec AD dataset.
|
||||
Experiments focus on tasks such as anomaly detection, class imbalance handling, //and comparison of distance metrics.
|
||||
and anomaly type classification.
|
||||
Results indicate that while FSL methods trail behind state-of-the-art algorithms in detecting anomalies, they excel in classifying anomaly types, showcasing potential in scenarios requiring detailed defect identification.
|
||||
Among the tested approaches, P>M>F emerged as the most robust, demonstrating superior accuracy across various settings.
|
||||
|
||||
This research underscores the limitations and niche applicability of FSL in anomaly detection, advocating its integration with established algorithms for enhanced performance.
|
||||
Future work should address the scalability and domain-specific adaptability of FSL techniques to broaden their utility in industrial applications.
|
||||
],
|
||||
abstract-de: none,// or specify the abbstract_de in a container []
|
||||
acknowledgements: none,//acknowledgements: none // if you are self-made
|
||||
show-title-in-header: false,
|
||||
@ -165,4 +177,4 @@
|
||||
#include "conclusionandoutlook.typ"
|
||||
|
||||
#set par(leading: 0.7em, first-line-indent: 0em, justify: true)
|
||||
#bibliography("sources.bib", style: "apa")
|
||||
#bibliography("sources.bib", style: "ieee")
|
||||
|
@ -18,7 +18,7 @@ Each category comprises a set of defect-free training images and a test set of i
|
||||
|
||||
In this bachelor thesis only two categories are used. The categories are "Bottle" and "Cable".
|
||||
|
||||
The bottle category contains 3 different defect classes: 'broken_large', 'broken_small' and 'contamination'.
|
||||
The bottle category contains 3 different defect classes: _broken_large_, _broken_small_ and _contamination_.
|
||||
#subpar.grid(
|
||||
figure(image("rsc/mvtec/bottle/broken_large_example.png"), caption: [
|
||||
Broken large defect
|
||||
@ -34,9 +34,9 @@ The bottle category contains 3 different defect classes: 'broken_large', 'broken
|
||||
label: <full>,
|
||||
)
|
||||
|
||||
Whereas cable has a lot more defect classes: 'bent_wire', 'cable_swap', 'combined', 'cut_inner_insulation',
|
||||
'cut_outer_insulation', 'missing_cable', 'missing_wire', 'poke_insulation'.
|
||||
So many more defect classes are already an indication that a classification task might be more difficult for the cable category.
|
||||
Whereas cable has a lot more defect classes: _bent_wire_, _cable_swap_, _combined_, _cut_inner_insulation_,
|
||||
_cut_outer_insulation_, _missing_cable_, _missing_wire_, _poke_insulation_.
|
||||
More defect classes are already an indication that a classification task might be more difficult for the cable category.
|
||||
|
||||
#subpar.grid(
|
||||
figure(image("rsc/mvtec/cable/bent_wire_example.png"), caption: [
|
||||
@ -72,32 +72,33 @@ So many more defect classes are already an indication that a classification task
|
||||
|
||||
=== Few-Shot Learning
|
||||
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
|
||||
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
|
||||
So the model is prone to overfitting to the few training samples.
|
||||
In contrast to traditional supervised learning, where a huge amount of labeled data is required to generalize well to unseen data,
|
||||
here we only have 1-10 samples per class (so called shots).
|
||||
So the model is prone to overfitting to the few training samples and this means they should represent the whole sample distribution as good as possible.~#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||
|
||||
Typically a few-shot leaning task consists of a support and query set.
|
||||
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
|
||||
A common way to format a few-shot leaning problem is using n-way k-shot notation.
|
||||
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.
|
||||
For Example, 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper
|
||||
|
||||
A classical example of how such a model might work is a prototypical network.
|
||||
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.
|
||||
These models learn a representation of each class in a reduced dimensionality and classify new examples based on proximity to these representations in an embedding space.~@snell2017prototypicalnetworksfewshotlearning
|
||||
|
||||
#figure(
|
||||
image("rsc/prototype_fewshot_v3.png", width: 60%),
|
||||
caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
|
||||
caption: [Prototypical network for 3-ways and 5-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
|
||||
) <prototypefewshot>
|
||||
|
||||
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
|
||||
See #todo[link to this section]
|
||||
#todo[proper source]
|
||||
The first and easiest method of this bachelor thesis uses a simple ResNet50 to calucalte those embeddings and clusters the shots together by calculating the class center.
|
||||
This is basically a simple prototypical network.
|
||||
See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust
|
||||
|
||||
=== Generalisation from few samples
|
||||
|
||||
An especially hard task is to generalize from such few samples.
|
||||
In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
|
||||
This helps the model to learn the underlying patterns and to generalize well to unseen data.
|
||||
In few-shot learning the model has to generalize from just a few samples.
|
||||
In few-shot learning the model has to generalize from just a few samples.#todo[Source?]#todo[Write more about. eg. class distributions]
|
||||
|
||||
=== Softmax
|
||||
#todo[Maybe remove this section]
|
||||
@ -126,10 +127,10 @@ $ <crel>
|
||||
Equation~$cal(L)(p,q)$ @crelbatched #cite(<handsonaiI>) is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
|
||||
|
||||
=== Cosine Similarity
|
||||
To measure the distance between two vectors some common distance measures are used.
|
||||
One popular of them is the Cosine Similarity (@cosinesimilarity).
|
||||
It measures the cosine of the angle between two vectors.
|
||||
The Cosine Similarity is especially useful when the magnitude of the vectors is not important.
|
||||
Cosine similarity is a widely used metric for measuring the similarity between two vectors. (@cosinesimilarity).
|
||||
It computes the cosine of the angle between the vectors, offering a measure of their alignment.
|
||||
This property makes the cosine similarity particularly effective in scenarios where the
|
||||
direction of the vector holds more important information than the magnitude.
|
||||
|
||||
$
|
||||
cos(theta) &:= (A dot B) / (||A|| dot ||B||)\
|
||||
@ -373,24 +374,78 @@ Its use of frozen pre-trained feature extractors is key to avoiding overfitting
|
||||
|
||||
== Alternative Methods
|
||||
|
||||
There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
|
||||
Either they performed worse on benchmarks compared to the used methods or they were released after my literature research.
|
||||
#todo[Do it!]
|
||||
There are several alternative methods to few-shot learning as well as to anomaly detection which are not used in this bachelor thesis.
|
||||
Either they performed worse on benchmarks compared to the used methods or they were released after my initial literature research.
|
||||
|
||||
=== SgVA-CLIP
|
||||
=== SgVA-CLIP (Semantic-guided Visual Adapting CLIP)
|
||||
// https://arxiv.org/pdf/2211.16191v2
|
||||
// https://arxiv.org/abs/2211.16191v2
|
||||
|
||||
=== TRIDENT
|
||||
SgVA-CLIP (Semantic-guided Visual Adapting CLIP) is a framework that improves few-shot learning by adapting pre-trained vision-language models like CLIP.
|
||||
It focuses on generating better visual features for specific tasks while still using the general knowledge from the pre-trained model.
|
||||
Instead of only aligning images and text, SgVA-CLIP includes a special visual adapting layer that makes the visual features more discriminative for the given task.
|
||||
This process is supported by knowledge distillation, where detailed information from the pre-trained model guides the learning of the new visual features.
|
||||
Additionally, the model uses contrastive losses to further refine both the visual and textual representations.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
|
||||
|
||||
One advantage of SgVA-CLIP is that it can work well with very few labeled samples, making it suitable for applications like anomaly detection.
|
||||
The use of pre-trained knowledge helps reduce the need for large datasets.
|
||||
However, a disadvantage is that it depends heavily on the quality and capabilities of the pre-trained model.
|
||||
If the pre-trained model lacks relevant information for the task, SgVA-CLIP might struggle to adapt.
|
||||
This might be a no-go for anomaly detection tasks because the images in such tasks are often very task-specific and not covered by general pre-trained models.
|
||||
Also, fine-tuning the model can require considerable computational resources, which might be a limitation in some cases.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
|
||||
|
||||
=== TRIDENT (Transductive Decoupled Variational Inference for Few-Shot Classification) <TRIDENT>
|
||||
// https://arxiv.org/pdf/2208.10559v1
|
||||
// https://arxiv.org/abs/2208.10559v1
|
||||
|
||||
TRIDENT, a variational infernce network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
|
||||
Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification.
|
||||
By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>)
|
||||
|
||||
To further improve the discriminative performance of the model, it incorporates a transductive feature extraction module named AttFEX (Attention-based Feature Extraction).
|
||||
This feature extractor dynamically aligns features from both the support and the query set, promoting task-specific embeddings.~#cite(<singh2022transductivedecoupledvariationalinference>)
|
||||
|
||||
This model is specifically designed for few-shot classification tasks but might also work well for anomaly detection.
|
||||
Its ability to isolate critical features while droping irellevant context aligns with requirements needed for anomaly detection.
|
||||
|
||||
=== SOT (Self-Optimal-Transport Feature Transform) <SOT>
|
||||
// https://arxiv.org/pdf/2204.03065v1
|
||||
// https://arxiv.org/abs/2204.03065v1
|
||||
|
||||
The Self-Optimal-Transport (SOT) Feature Transform is designed to enhance feature sets for tsks like matching, grouping or classification by re-embedding feature representations.
|
||||
This transform processes features as a set instead of using them individually.
|
||||
This creates context-aware representations.
|
||||
SOT can catch direct as well as indirect similarities between features which makes it suitable for tasks like few-shot learning or clustering.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
SOT uses a transport plan matrix derived from optimal transport theory to redefine feature relations.
|
||||
This includes calculating pairwaise similarities (e.g. cosine similarities) between features and solving a min-cost max-flow problem to find an optimal match between features.
|
||||
This results in an doubly stochastic matrix where each row represents the re-embedding of the corresponding feature in context with others.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
The transform features parameterless-ness, which makes it easy to integrate into existing machine-learning pipelines.
|
||||
It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT.
|
||||
SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
The improvements of SOT over traditional feature transforms dpeend on the used backbone network and the task.
|
||||
But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
// anomaly detect
|
||||
=== GLASS (Global and Local Anomaly co-Synthesis Strategy)
|
||||
// https://arxiv.org/pdf/2407.09359v1
|
||||
// https://arxiv.org/abs/2407.09359v1
|
||||
|
||||
GLASS (Global and Local Anomaly co-Synthesis Strategy) is a anomaly detection method for industrial applications.
|
||||
It is a unified network which uses two different strategies to detect anomalies which are then combined.
|
||||
The first one is Global Anomaly Synthesis (GAS), it operates on the feature level.
|
||||
It uses a gaussian noise, guided by gradient ascent and constrained by truncated projection to generate anomalies close to the distribution for the normal features.
|
||||
This helps the detection of weak defects.
|
||||
The second strategy is Local Anomaly Synthesis (LAS), it operates on the image level.
|
||||
This strategy overlays textures onto normal images using masks derived from noise patterns.
|
||||
LAS creates strong anomalies which are further away from the normal sample distribution.
|
||||
This adds diversity to the synthesized anomalies.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
|
||||
|
||||
GLASS combines GAS and LAS to improve anomaly detection and localization by synthesizing anomalies near and far from the normal distribution.
|
||||
Experiments show that GLASS is very effective and outperforms some state-of-the-art methods on the MVTec AD dataset such as PatchCore in some cases.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
|
||||
|
||||
//=== HETMM (Hard-normal Example-aware Template Mutual Matching)
|
||||
// https://arxiv.org/pdf/2303.16191v5
|
||||
// https://arxiv.org/abs/2303.16191v5
|
||||
|
60
sources.bib
60
sources.bib
@ -137,3 +137,63 @@
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2204.07305},
|
||||
}
|
||||
|
||||
@misc{peng2023sgvaclipsemanticguidedvisualadapting,
|
||||
title={SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification},
|
||||
author={Fang Peng and Xiaoshan Yang and Linhui Xiao and Yaowei Wang and Changsheng Xu},
|
||||
year={2023},
|
||||
eprint={2211.16191},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2211.16191},
|
||||
}
|
||||
|
||||
@misc{singh2022transductivedecoupledvariationalinference,
|
||||
title={Transductive Decoupled Variational Inference for Few-Shot Classification},
|
||||
author={Anuj Singh and Hadi Jamali-Rad},
|
||||
year={2022},
|
||||
eprint={2208.10559},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2208.10559},
|
||||
}
|
||||
|
||||
@misc{chen2024unifiedanomalysynthesisstrategy,
|
||||
title={A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization},
|
||||
author={Qiyu Chen and Huiyuan Luo and Chengkan Lv and Zhengtao Zhang},
|
||||
year={2024},
|
||||
eprint={2407.09359},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2407.09359},
|
||||
}
|
||||
|
||||
@misc{shalam2022selfoptimaltransportfeaturetransform,
|
||||
title={The Self-Optimal-Transport Feature Transform},
|
||||
author={Daniel Shalam and Simon Korman},
|
||||
year={2022},
|
||||
eprint={2204.03065},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2204.03065},
|
||||
}
|
||||
|
||||
@misc{parnami2022learningexamplessummaryapproaches,
|
||||
title={Learning from Few Examples: A Summary of Approaches to Few-Shot Learning},
|
||||
author={Archit Parnami and Minwoo Lee},
|
||||
year={2022},
|
||||
eprint={2203.04291},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.LG},
|
||||
url={https://arxiv.org/abs/2203.04291},
|
||||
}
|
||||
|
||||
@misc{chowdhury2021fewshotimageclassificationjust,
|
||||
title={Few-shot Image Classification: Just Use a Library of Pre-trained Feature Extractors and a Simple Classifier},
|
||||
author={Arkabandhu Chowdhury and Mingchao Jiang and Swarat Chaudhuri and Chris Jermaine},
|
||||
year={2021},
|
||||
eprint={2101.00562},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2101.00562},
|
||||
}
|
||||
|
Reference in New Issue
Block a user