add several sources and fix some errors in text
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 29s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 29s
This commit is contained in:
parent
8f28a8c387
commit
8a4b33e67a
@ -101,7 +101,7 @@ One could use a well established algorithm like PatchCore or EfficientAD for det
|
||||
label: <comparisonnormal>,
|
||||
)
|
||||
|
||||
#if inwriting [
|
||||
/*#if inwriting [
|
||||
== Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?
|
||||
#todo[Maybe don't do this]
|
||||
]
|
||||
]*/
|
||||
|
@ -31,8 +31,8 @@ The rest of the images was used to test the model and measure the accuracy.
|
||||
=== Approach
|
||||
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
|
||||
From both the support and query set the features are extracted to get a downprojected representation of the images.
|
||||
The support set embeddings are compared to the query set embeddings.
|
||||
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
|
||||
After downprojection the support set embeddings are compared to the query set embeddings.
|
||||
To predict the class of a query, the class with the smallest distance to the support embedding is chosen.
|
||||
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
|
||||
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
|
||||
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
|
||||
@ -94,13 +94,13 @@ The class with the smallest distance is chosen as the predicted class.
|
||||
=== Results <resnet50perf>
|
||||
This method performed better than expected wich such a simple method.
|
||||
As in @resnet50bottleperfa with a normal 5 shot / 4 way classification the model achieved an accuracy of 75%.
|
||||
When detecting only if there occured an anomaly or not the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
|
||||
When detecting if there occured an anomaly or not only the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
|
||||
Interestintly the model performed slightly better with fewer shots in this case.
|
||||
Moreover in @resnet50bottleperfa, the detection of the anomaly class only (3 way) shows a similar pattern as the normal 4 way classification.
|
||||
The more shots the better the performance and it peaks at around 88% accuracy with 5 shots.
|
||||
|
||||
In @resnet50bottleperfb the model was tested with inbalanced class distributions.
|
||||
With [5,10,15,30] good shots and 5 bad shots the model performed worse than with balanced classes.
|
||||
With {5, 10, 15, 30} good shots and 5 bad shots the model performed worse than with balanced classes.
|
||||
The more good shots the worse the performance.
|
||||
The only exception is the faulty or not detection (2 way) where the model peaked at 15 good shots with 83% accuracy.
|
||||
|
||||
@ -136,13 +136,13 @@ but this is expected as the cable class consists of 8 faulty classes.
|
||||
|
||||
== P>M>F
|
||||
=== Approach
|
||||
For P>M>F the pretrained model weights from the original paper were used.
|
||||
For P>M>F I used the pretrained model weights from the original paper.
|
||||
As backbone feature extractor a DINO model is used, which is pre-trained by facebook.
|
||||
This is a vision transformer with a patch size of 16 and 12 attention heads learned in a self-supervised fashion.
|
||||
This feature extractor was meta-trained with 10 public image dasets #footnote[ImageNet-1k, Omniglot, FGVC-
|
||||
Aircraft, CUB-200-2011, Describable Textures, QuickDraw,
|
||||
FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~#cite(<pmfpaper>)]
|
||||
of diverse domains by the authors of the original paper.#cite(<pmfpaper>)
|
||||
FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~@pmfpaper]
|
||||
of diverse domains by the authors of the original paper.~@pmfpaper
|
||||
|
||||
Finally, this model is finetuned with the support set of every test iteration.
|
||||
Everytime the support set changes we need to finetune the model again.
|
||||
@ -182,7 +182,7 @@ So it is clearly a bad idea to add more good shots to the support set.
|
||||
|
||||
== CAML
|
||||
=== Approach
|
||||
For the CAML implementation the pretrained model weights from the original paper were used.
|
||||
For the CAML implementation I used the pretrained model weights from the original paper.
|
||||
The non-causal sequence model (transformer) is pretrained with every class having the same number of shots.
|
||||
This brings the limitation that it can only process default few-shot learning tasks in the n-way k-shots fashion.
|
||||
Since it expects the input sequence to be distributed with the same number of shots per class.
|
||||
@ -190,7 +190,7 @@ This is the reason why for this method the two imbalanced test cases couldn't be
|
||||
|
||||
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
|
||||
This feature extractor was already pretrained when used by the authors of the original paper.
|
||||
For the non-causal sequence model a transformer model was used
|
||||
In this case for the non-causal sequence model a transformer model was used.
|
||||
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
|
||||
This transformer was trained on a huge number of images as described in @CAML.
|
||||
|
||||
@ -198,7 +198,8 @@ This transformer was trained on a huge number of images as described in @CAML.
|
||||
The results were not as good as expeced.
|
||||
This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain.
|
||||
The model was trained on a large number of general purpose images and is not fine-tuned at all.
|
||||
It might not handle very similar images well.
|
||||
Moreover, it was not fine-tuned on the support set similar to the P>M>F method, which could have a huge impact on performance.
|
||||
It might also not handle very similar images well.
|
||||
|
||||
Compared the the other two methods CAML performed poorly in almost all experiments.
|
||||
The normal few-shot classification reached only 40% accuracy in @camlperfa at best.
|
||||
|
@ -29,10 +29,10 @@ _Does giving the Few-Shot learner more good than bad samples improve the model p
|
||||
_How much does the performance improve if only detecting an anomaly or not?
|
||||
How does it compare to PatchCore and EfficientAD?_
|
||||
|
||||
#if inwriting [
|
||||
/*#if inwriting [
|
||||
=== _Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?_
|
||||
// I've tried different distance measures $->$ but results are pretty much the same.
|
||||
]
|
||||
]*/
|
||||
|
||||
== Outline
|
||||
This thesis is structured to provide a comprehensive exploration of Few-Shot Learning in anomaly detection.
|
||||
|
@ -98,7 +98,8 @@ See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust
|
||||
An especially hard task is to generalize from such few samples.
|
||||
In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
|
||||
This helps the model to learn the underlying patterns and to generalize well to unseen data.
|
||||
In few-shot learning the model has to generalize from just a few samples.#todo[Source?]#todo[Write more about. eg. class distributions]
|
||||
In few-shot learning the model has to generalize from just a few samples.#todo[Write more about. eg. class distributions]
|
||||
@Goodfellow-et-al-2016
|
||||
|
||||
=== Softmax
|
||||
#todo[Maybe remove this section]
|
||||
@ -131,22 +132,23 @@ To measure the distance between two vectors some common distance measures are us
|
||||
One popular of them is the Cosine Similarity (@cosinesimilarity).
|
||||
It measures the cosine of the angle between two vectors.
|
||||
The Cosine Similarity is especially useful when the magnitude of the vectors is not important.
|
||||
@dataminingbook@analysisrudin
|
||||
|
||||
$
|
||||
cos(theta) &:= (A dot B) / (||A|| dot ||B||)\
|
||||
&= (sum_(i=1)^n A_i B_i)/ (sqrt(sum_(i=1)^n A_i^2) dot sqrt(sum_(i=1)^n B_i^2))
|
||||
$ <cosinesimilarity>
|
||||
|
||||
#todo[Source?]
|
||||
=== Euclidean Distance
|
||||
The euclidean distance (@euclideannorm) is a simpler method to measure the distance between two points in a vector space.
|
||||
It just calculates the square root of the sum of the squared differences of the coordinates.
|
||||
the euclidean distance can also be represented as the L2 norm (euclidean norm) of the difference of the two vectors.
|
||||
@analysisrudin
|
||||
|
||||
$
|
||||
cal(d)(A,B) = ||A-B|| := sqrt(sum_(i=1)^n (A_i - B_i)^2)
|
||||
$ <euclideannorm>
|
||||
#todo[Source?]
|
||||
|
||||
|
||||
=== Patchcore
|
||||
// https://arxiv.org/pdf/2106.08265
|
||||
|
30
sources.bib
30
sources.bib
@ -197,3 +197,33 @@
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2101.00562},
|
||||
}
|
||||
|
||||
|
||||
@book{analysisrudin,
|
||||
title = {Principles of mathematical analysis},
|
||||
author = {Walter Rudin},
|
||||
isbn = {},
|
||||
series = {Mathermatics Series},
|
||||
year = {1976},
|
||||
publisher = {Mc Graw Hill},
|
||||
keywords = {mathematics}
|
||||
}
|
||||
|
||||
|
||||
@book{dataminingbook,
|
||||
title = {Data Mining: Concepts and Techniques},
|
||||
author = {Jiawei Han, Micheline Kamber, Jian Pei},
|
||||
isbn = {},
|
||||
series = {The Morgan Kaufmann Series in Data Management Systems},
|
||||
year = {2012},
|
||||
publisher = {Morgran Kaufmann},
|
||||
keywords = {mathematics}
|
||||
}
|
||||
|
||||
@book{Goodfellow-et-al-2016,
|
||||
title={Deep Learning},
|
||||
author={Ian Goodfellow and Yoshua Bengio and Aaron Courville},
|
||||
publisher={MIT Press},
|
||||
note={\url{http://www.deeplearningbook.org}},
|
||||
year={2016}
|
||||
}
|
||||
|
Loading…
x
Reference in New Issue
Block a user