Compare commits
1 Commits
fixes
...
8a4b33e67a
Author | SHA1 | Date | |
---|---|---|---|
8a4b33e67a |
@ -64,8 +64,8 @@ Which is an result that is unexpected (since one can think more samples perform
|
||||
Clearly all four graphs show that the performance decreases with an increasing number of good samples.
|
||||
So the conclusion is that the Few-Shot learner should always be trained with as balanced classes as possible.
|
||||
|
||||
== How do the 3 (ResNet, CAML, P>M>F) methods perform in only detecting the anomaly class?
|
||||
_How much does the performance improve by only detecting the presence of an anomaly?
|
||||
== How does the 3 (ResNet, CAML, P>M>F) methods perform in only detecting the anomaly class?
|
||||
_How much does the performance improve if only detecting an anomaly or not?
|
||||
How does it compare to PatchCore and EfficientAD#todo[Maybe remove comparion?]?_
|
||||
|
||||
@comparisonnormal shows graphs comparing the performance of the ResNet, CAML and P>M>F methods in detecting the anomaly class only including the good class as well as excluding the good class.
|
||||
@ -101,7 +101,7 @@ One could use a well established algorithm like PatchCore or EfficientAD for det
|
||||
label: <comparisonnormal>,
|
||||
)
|
||||
|
||||
#if inwriting [
|
||||
/*#if inwriting [
|
||||
== Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?
|
||||
#todo[Maybe don't do this]
|
||||
]
|
||||
]*/
|
||||
|
@ -31,8 +31,8 @@ The rest of the images was used to test the model and measure the accuracy.
|
||||
=== Approach
|
||||
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
|
||||
From both the support and query set the features are extracted to get a downprojected representation of the images.
|
||||
The support set embeddings are compared to the query set embeddings.
|
||||
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
|
||||
After downprojection the support set embeddings are compared to the query set embeddings.
|
||||
To predict the class of a query, the class with the smallest distance to the support embedding is chosen.
|
||||
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
|
||||
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
|
||||
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
|
||||
@ -94,13 +94,13 @@ The class with the smallest distance is chosen as the predicted class.
|
||||
=== Results <resnet50perf>
|
||||
This method performed better than expected wich such a simple method.
|
||||
As in @resnet50bottleperfa with a normal 5 shot / 4 way classification the model achieved an accuracy of 75%.
|
||||
When detecting only if there occured an anomaly or not the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
|
||||
When detecting if there occured an anomaly or not only the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
|
||||
Interestintly the model performed slightly better with fewer shots in this case.
|
||||
Moreover in @resnet50bottleperfa, the detection of the anomaly class only (3 way) shows a similar pattern as the normal 4 way classification.
|
||||
The more shots the better the performance and it peaks at around 88% accuracy with 5 shots.
|
||||
|
||||
In @resnet50bottleperfb the model was tested with inbalanced class distributions.
|
||||
With [5,10,15,30] good shots and 5 bad shots the model performed worse than with balanced classes.
|
||||
With {5, 10, 15, 30} good shots and 5 bad shots the model performed worse than with balanced classes.
|
||||
The more good shots the worse the performance.
|
||||
The only exception is the faulty or not detection (2 way) where the model peaked at 15 good shots with 83% accuracy.
|
||||
|
||||
@ -136,13 +136,13 @@ but this is expected as the cable class consists of 8 faulty classes.
|
||||
|
||||
== P>M>F
|
||||
=== Approach
|
||||
For P>M>F the pretrained model weights from the original paper were used.
|
||||
For P>M>F I used the pretrained model weights from the original paper.
|
||||
As backbone feature extractor a DINO model is used, which is pre-trained by facebook.
|
||||
This is a vision transformer with a patch size of 16 and 12 attention heads learned in a self-supervised fashion.
|
||||
This feature extractor was meta-trained with 10 public image dasets #footnote[ImageNet-1k, Omniglot, FGVC-
|
||||
Aircraft, CUB-200-2011, Describable Textures, QuickDraw,
|
||||
FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~#cite(<pmfpaper>)]
|
||||
of diverse domains by the authors of the original paper.#cite(<pmfpaper>)
|
||||
FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~@pmfpaper]
|
||||
of diverse domains by the authors of the original paper.~@pmfpaper
|
||||
|
||||
Finally, this model is finetuned with the support set of every test iteration.
|
||||
Everytime the support set changes we need to finetune the model again.
|
||||
@ -182,7 +182,7 @@ So it is clearly a bad idea to add more good shots to the support set.
|
||||
|
||||
== CAML
|
||||
=== Approach
|
||||
For the CAML implementation the pretrained model weights from the original paper were used.
|
||||
For the CAML implementation I used the pretrained model weights from the original paper.
|
||||
The non-causal sequence model (transformer) is pretrained with every class having the same number of shots.
|
||||
This brings the limitation that it can only process default few-shot learning tasks in the n-way k-shots fashion.
|
||||
Since it expects the input sequence to be distributed with the same number of shots per class.
|
||||
@ -190,7 +190,7 @@ This is the reason why for this method the two imbalanced test cases couldn't be
|
||||
|
||||
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
|
||||
This feature extractor was already pretrained when used by the authors of the original paper.
|
||||
For the non-causal sequence model a transformer model was used
|
||||
In this case for the non-causal sequence model a transformer model was used.
|
||||
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
|
||||
This transformer was trained on a huge number of images as described in @CAML.
|
||||
|
||||
@ -198,7 +198,8 @@ This transformer was trained on a huge number of images as described in @CAML.
|
||||
The results were not as good as expeced.
|
||||
This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain.
|
||||
The model was trained on a large number of general purpose images and is not fine-tuned at all.
|
||||
It might not handle very similar images well.
|
||||
Moreover, it was not fine-tuned on the support set similar to the P>M>F method, which could have a huge impact on performance.
|
||||
It might also not handle very similar images well.
|
||||
|
||||
Compared the the other two methods CAML performed poorly in almost all experiments.
|
||||
The normal few-shot classification reached only 40% accuracy in @camlperfa at best.
|
||||
|
@ -6,7 +6,7 @@ Anomaly detection has especially in the industrial and automotive field essentia
|
||||
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
|
||||
Machine learning helped the field to advance a lot in the past.
|
||||
Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available.
|
||||
So the train data is heavily unbalanced.~#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||
So the train data is heavily unbalaned.~#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||
|
||||
PatchCore and EfficientAD are state of the art algorithms trained only on good data and then detect anomalies within unseen (but similar) data.
|
||||
One of their problems is the need of lots of training data and time to train.
|
||||
@ -25,14 +25,14 @@ How does it compare to well established algorithms such as Patchcore or Efficien
|
||||
=== How does disbalancing the Shot number affect performance?
|
||||
_Does giving the Few-Shot learner more good than bad samples improve the model performance?_
|
||||
|
||||
=== How do the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?
|
||||
_How much does the performance improve by only detecting the presence of an anomaly?
|
||||
=== How does the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?
|
||||
_How much does the performance improve if only detecting an anomaly or not?
|
||||
How does it compare to PatchCore and EfficientAD?_
|
||||
|
||||
#if inwriting [
|
||||
/*#if inwriting [
|
||||
=== _Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?_
|
||||
// I've tried different distance measures $->$ but results are pretty much the same.
|
||||
]
|
||||
]*/
|
||||
|
||||
== Outline
|
||||
This thesis is structured to provide a comprehensive exploration of Few-Shot Learning in anomaly detection.
|
||||
|
@ -36,7 +36,7 @@ The bottle category contains 3 different defect classes: _broken_large_, _broken
|
||||
|
||||
Whereas cable has a lot more defect classes: _bent_wire_, _cable_swap_, _combined_, _cut_inner_insulation_,
|
||||
_cut_outer_insulation_, _missing_cable_, _missing_wire_, _poke_insulation_.
|
||||
More defect classes are already an indication that a classification task might be more difficult for the cable category.
|
||||
So many more defect classes are already an indication that a classification task might be more difficult for the cable category.
|
||||
|
||||
#subpar.grid(
|
||||
figure(image("rsc/mvtec/cable/bent_wire_example.png"), caption: [
|
||||
@ -79,7 +79,7 @@ So the model is prone to overfitting to the few training samples and this means
|
||||
Typically a few-shot leaning task consists of a support and query set.
|
||||
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
|
||||
A common way to format a few-shot leaning problem is using n-way k-shot notation.
|
||||
For Example, 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper
|
||||
For Example 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper
|
||||
|
||||
A classical example of how such a model might work is a prototypical network.
|
||||
These models learn a representation of each class in a reduced dimensionality and classify new examples based on proximity to these representations in an embedding space.~@snell2017prototypicalnetworksfewshotlearning
|
||||
@ -98,7 +98,8 @@ See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust
|
||||
An especially hard task is to generalize from such few samples.
|
||||
In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
|
||||
This helps the model to learn the underlying patterns and to generalize well to unseen data.
|
||||
In few-shot learning the model has to generalize from just a few samples.#todo[Source?]#todo[Write more about. eg. class distributions]
|
||||
In few-shot learning the model has to generalize from just a few samples.#todo[Write more about. eg. class distributions]
|
||||
@Goodfellow-et-al-2016
|
||||
|
||||
=== Softmax
|
||||
#todo[Maybe remove this section]
|
||||
@ -127,26 +128,27 @@ $ <crel>
|
||||
Equation~$cal(L)(p,q)$ @crelbatched #cite(<handsonaiI>) is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
|
||||
|
||||
=== Cosine Similarity
|
||||
Cosine similarity is a widely used metric for measuring the similarity between two vectors. (@cosinesimilarity).
|
||||
It computes the cosine of the angle between the vectors, offering a measure of their alignment.
|
||||
This property makes the cosine similarity particularly effective in scenarios where the
|
||||
direction of the vector holds more important information than the magnitude.
|
||||
To measure the distance between two vectors some common distance measures are used.
|
||||
One popular of them is the Cosine Similarity (@cosinesimilarity).
|
||||
It measures the cosine of the angle between two vectors.
|
||||
The Cosine Similarity is especially useful when the magnitude of the vectors is not important.
|
||||
@dataminingbook@analysisrudin
|
||||
|
||||
$
|
||||
cos(theta) &:= (A dot B) / (||A|| dot ||B||)\
|
||||
&= (sum_(i=1)^n A_i B_i)/ (sqrt(sum_(i=1)^n A_i^2) dot sqrt(sum_(i=1)^n B_i^2))
|
||||
$ <cosinesimilarity>
|
||||
|
||||
#todo[Source?]
|
||||
=== Euclidean Distance
|
||||
The euclidean distance (@euclideannorm) is a simpler method to measure the distance between two points in a vector space.
|
||||
It just calculates the square root of the sum of the squared differences of the coordinates.
|
||||
the euclidean distance can also be represented as the L2 norm (euclidean norm) of the difference of the two vectors.
|
||||
@analysisrudin
|
||||
|
||||
$
|
||||
cal(d)(A,B) = ||A-B|| := sqrt(sum_(i=1)^n (A_i - B_i)^2)
|
||||
$ <euclideannorm>
|
||||
#todo[Source?]
|
||||
|
||||
|
||||
=== Patchcore
|
||||
// https://arxiv.org/pdf/2106.08265
|
||||
|
30
sources.bib
30
sources.bib
@ -197,3 +197,33 @@
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2101.00562},
|
||||
}
|
||||
|
||||
|
||||
@book{analysisrudin,
|
||||
title = {Principles of mathematical analysis},
|
||||
author = {Walter Rudin},
|
||||
isbn = {},
|
||||
series = {Mathermatics Series},
|
||||
year = {1976},
|
||||
publisher = {Mc Graw Hill},
|
||||
keywords = {mathematics}
|
||||
}
|
||||
|
||||
|
||||
@book{dataminingbook,
|
||||
title = {Data Mining: Concepts and Techniques},
|
||||
author = {Jiawei Han, Micheline Kamber, Jian Pei},
|
||||
isbn = {},
|
||||
series = {The Morgan Kaufmann Series in Data Management Systems},
|
||||
year = {2012},
|
||||
publisher = {Morgran Kaufmann},
|
||||
keywords = {mathematics}
|
||||
}
|
||||
|
||||
@book{Goodfellow-et-al-2016,
|
||||
title={Deep Learning},
|
||||
author={Ian Goodfellow and Yoshua Bengio and Aaron Courville},
|
||||
publisher={MIT Press},
|
||||
note={\url{http://www.deeplearningbook.org}},
|
||||
year={2016}
|
||||
}
|
||||
|
Reference in New Issue
Block a user