add several sources and fix some errors in text
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 29s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 29s
This commit is contained in:
parent
8f28a8c387
commit
8a4b33e67a
@ -101,7 +101,7 @@ One could use a well established algorithm like PatchCore or EfficientAD for det
|
|||||||
label: <comparisonnormal>,
|
label: <comparisonnormal>,
|
||||||
)
|
)
|
||||||
|
|
||||||
#if inwriting [
|
/*#if inwriting [
|
||||||
== Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?
|
== Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?
|
||||||
#todo[Maybe don't do this]
|
#todo[Maybe don't do this]
|
||||||
]
|
]*/
|
||||||
|
@ -31,8 +31,8 @@ The rest of the images was used to test the model and measure the accuracy.
|
|||||||
=== Approach
|
=== Approach
|
||||||
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
|
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
|
||||||
From both the support and query set the features are extracted to get a downprojected representation of the images.
|
From both the support and query set the features are extracted to get a downprojected representation of the images.
|
||||||
The support set embeddings are compared to the query set embeddings.
|
After downprojection the support set embeddings are compared to the query set embeddings.
|
||||||
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
|
To predict the class of a query, the class with the smallest distance to the support embedding is chosen.
|
||||||
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
|
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
|
||||||
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
|
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
|
||||||
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
|
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
|
||||||
@ -94,13 +94,13 @@ The class with the smallest distance is chosen as the predicted class.
|
|||||||
=== Results <resnet50perf>
|
=== Results <resnet50perf>
|
||||||
This method performed better than expected wich such a simple method.
|
This method performed better than expected wich such a simple method.
|
||||||
As in @resnet50bottleperfa with a normal 5 shot / 4 way classification the model achieved an accuracy of 75%.
|
As in @resnet50bottleperfa with a normal 5 shot / 4 way classification the model achieved an accuracy of 75%.
|
||||||
When detecting only if there occured an anomaly or not the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
|
When detecting if there occured an anomaly or not only the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
|
||||||
Interestintly the model performed slightly better with fewer shots in this case.
|
Interestintly the model performed slightly better with fewer shots in this case.
|
||||||
Moreover in @resnet50bottleperfa, the detection of the anomaly class only (3 way) shows a similar pattern as the normal 4 way classification.
|
Moreover in @resnet50bottleperfa, the detection of the anomaly class only (3 way) shows a similar pattern as the normal 4 way classification.
|
||||||
The more shots the better the performance and it peaks at around 88% accuracy with 5 shots.
|
The more shots the better the performance and it peaks at around 88% accuracy with 5 shots.
|
||||||
|
|
||||||
In @resnet50bottleperfb the model was tested with inbalanced class distributions.
|
In @resnet50bottleperfb the model was tested with inbalanced class distributions.
|
||||||
With [5,10,15,30] good shots and 5 bad shots the model performed worse than with balanced classes.
|
With {5, 10, 15, 30} good shots and 5 bad shots the model performed worse than with balanced classes.
|
||||||
The more good shots the worse the performance.
|
The more good shots the worse the performance.
|
||||||
The only exception is the faulty or not detection (2 way) where the model peaked at 15 good shots with 83% accuracy.
|
The only exception is the faulty or not detection (2 way) where the model peaked at 15 good shots with 83% accuracy.
|
||||||
|
|
||||||
@ -136,13 +136,13 @@ but this is expected as the cable class consists of 8 faulty classes.
|
|||||||
|
|
||||||
== P>M>F
|
== P>M>F
|
||||||
=== Approach
|
=== Approach
|
||||||
For P>M>F the pretrained model weights from the original paper were used.
|
For P>M>F I used the pretrained model weights from the original paper.
|
||||||
As backbone feature extractor a DINO model is used, which is pre-trained by facebook.
|
As backbone feature extractor a DINO model is used, which is pre-trained by facebook.
|
||||||
This is a vision transformer with a patch size of 16 and 12 attention heads learned in a self-supervised fashion.
|
This is a vision transformer with a patch size of 16 and 12 attention heads learned in a self-supervised fashion.
|
||||||
This feature extractor was meta-trained with 10 public image dasets #footnote[ImageNet-1k, Omniglot, FGVC-
|
This feature extractor was meta-trained with 10 public image dasets #footnote[ImageNet-1k, Omniglot, FGVC-
|
||||||
Aircraft, CUB-200-2011, Describable Textures, QuickDraw,
|
Aircraft, CUB-200-2011, Describable Textures, QuickDraw,
|
||||||
FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~#cite(<pmfpaper>)]
|
FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~@pmfpaper]
|
||||||
of diverse domains by the authors of the original paper.#cite(<pmfpaper>)
|
of diverse domains by the authors of the original paper.~@pmfpaper
|
||||||
|
|
||||||
Finally, this model is finetuned with the support set of every test iteration.
|
Finally, this model is finetuned with the support set of every test iteration.
|
||||||
Everytime the support set changes we need to finetune the model again.
|
Everytime the support set changes we need to finetune the model again.
|
||||||
@ -182,7 +182,7 @@ So it is clearly a bad idea to add more good shots to the support set.
|
|||||||
|
|
||||||
== CAML
|
== CAML
|
||||||
=== Approach
|
=== Approach
|
||||||
For the CAML implementation the pretrained model weights from the original paper were used.
|
For the CAML implementation I used the pretrained model weights from the original paper.
|
||||||
The non-causal sequence model (transformer) is pretrained with every class having the same number of shots.
|
The non-causal sequence model (transformer) is pretrained with every class having the same number of shots.
|
||||||
This brings the limitation that it can only process default few-shot learning tasks in the n-way k-shots fashion.
|
This brings the limitation that it can only process default few-shot learning tasks in the n-way k-shots fashion.
|
||||||
Since it expects the input sequence to be distributed with the same number of shots per class.
|
Since it expects the input sequence to be distributed with the same number of shots per class.
|
||||||
@ -190,7 +190,7 @@ This is the reason why for this method the two imbalanced test cases couldn't be
|
|||||||
|
|
||||||
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
|
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
|
||||||
This feature extractor was already pretrained when used by the authors of the original paper.
|
This feature extractor was already pretrained when used by the authors of the original paper.
|
||||||
For the non-causal sequence model a transformer model was used
|
In this case for the non-causal sequence model a transformer model was used.
|
||||||
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
|
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
|
||||||
This transformer was trained on a huge number of images as described in @CAML.
|
This transformer was trained on a huge number of images as described in @CAML.
|
||||||
|
|
||||||
@ -198,7 +198,8 @@ This transformer was trained on a huge number of images as described in @CAML.
|
|||||||
The results were not as good as expeced.
|
The results were not as good as expeced.
|
||||||
This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain.
|
This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain.
|
||||||
The model was trained on a large number of general purpose images and is not fine-tuned at all.
|
The model was trained on a large number of general purpose images and is not fine-tuned at all.
|
||||||
It might not handle very similar images well.
|
Moreover, it was not fine-tuned on the support set similar to the P>M>F method, which could have a huge impact on performance.
|
||||||
|
It might also not handle very similar images well.
|
||||||
|
|
||||||
Compared the the other two methods CAML performed poorly in almost all experiments.
|
Compared the the other two methods CAML performed poorly in almost all experiments.
|
||||||
The normal few-shot classification reached only 40% accuracy in @camlperfa at best.
|
The normal few-shot classification reached only 40% accuracy in @camlperfa at best.
|
||||||
|
@ -29,10 +29,10 @@ _Does giving the Few-Shot learner more good than bad samples improve the model p
|
|||||||
_How much does the performance improve if only detecting an anomaly or not?
|
_How much does the performance improve if only detecting an anomaly or not?
|
||||||
How does it compare to PatchCore and EfficientAD?_
|
How does it compare to PatchCore and EfficientAD?_
|
||||||
|
|
||||||
#if inwriting [
|
/*#if inwriting [
|
||||||
=== _Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?_
|
=== _Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?_
|
||||||
// I've tried different distance measures $->$ but results are pretty much the same.
|
// I've tried different distance measures $->$ but results are pretty much the same.
|
||||||
]
|
]*/
|
||||||
|
|
||||||
== Outline
|
== Outline
|
||||||
This thesis is structured to provide a comprehensive exploration of Few-Shot Learning in anomaly detection.
|
This thesis is structured to provide a comprehensive exploration of Few-Shot Learning in anomaly detection.
|
||||||
|
@ -98,7 +98,8 @@ See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust
|
|||||||
An especially hard task is to generalize from such few samples.
|
An especially hard task is to generalize from such few samples.
|
||||||
In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
|
In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
|
||||||
This helps the model to learn the underlying patterns and to generalize well to unseen data.
|
This helps the model to learn the underlying patterns and to generalize well to unseen data.
|
||||||
In few-shot learning the model has to generalize from just a few samples.#todo[Source?]#todo[Write more about. eg. class distributions]
|
In few-shot learning the model has to generalize from just a few samples.#todo[Write more about. eg. class distributions]
|
||||||
|
@Goodfellow-et-al-2016
|
||||||
|
|
||||||
=== Softmax
|
=== Softmax
|
||||||
#todo[Maybe remove this section]
|
#todo[Maybe remove this section]
|
||||||
@ -131,22 +132,23 @@ To measure the distance between two vectors some common distance measures are us
|
|||||||
One popular of them is the Cosine Similarity (@cosinesimilarity).
|
One popular of them is the Cosine Similarity (@cosinesimilarity).
|
||||||
It measures the cosine of the angle between two vectors.
|
It measures the cosine of the angle between two vectors.
|
||||||
The Cosine Similarity is especially useful when the magnitude of the vectors is not important.
|
The Cosine Similarity is especially useful when the magnitude of the vectors is not important.
|
||||||
|
@dataminingbook@analysisrudin
|
||||||
|
|
||||||
$
|
$
|
||||||
cos(theta) &:= (A dot B) / (||A|| dot ||B||)\
|
cos(theta) &:= (A dot B) / (||A|| dot ||B||)\
|
||||||
&= (sum_(i=1)^n A_i B_i)/ (sqrt(sum_(i=1)^n A_i^2) dot sqrt(sum_(i=1)^n B_i^2))
|
&= (sum_(i=1)^n A_i B_i)/ (sqrt(sum_(i=1)^n A_i^2) dot sqrt(sum_(i=1)^n B_i^2))
|
||||||
$ <cosinesimilarity>
|
$ <cosinesimilarity>
|
||||||
|
|
||||||
#todo[Source?]
|
|
||||||
=== Euclidean Distance
|
=== Euclidean Distance
|
||||||
The euclidean distance (@euclideannorm) is a simpler method to measure the distance between two points in a vector space.
|
The euclidean distance (@euclideannorm) is a simpler method to measure the distance between two points in a vector space.
|
||||||
It just calculates the square root of the sum of the squared differences of the coordinates.
|
It just calculates the square root of the sum of the squared differences of the coordinates.
|
||||||
the euclidean distance can also be represented as the L2 norm (euclidean norm) of the difference of the two vectors.
|
the euclidean distance can also be represented as the L2 norm (euclidean norm) of the difference of the two vectors.
|
||||||
|
@analysisrudin
|
||||||
|
|
||||||
$
|
$
|
||||||
cal(d)(A,B) = ||A-B|| := sqrt(sum_(i=1)^n (A_i - B_i)^2)
|
cal(d)(A,B) = ||A-B|| := sqrt(sum_(i=1)^n (A_i - B_i)^2)
|
||||||
$ <euclideannorm>
|
$ <euclideannorm>
|
||||||
#todo[Source?]
|
|
||||||
|
|
||||||
=== Patchcore
|
=== Patchcore
|
||||||
// https://arxiv.org/pdf/2106.08265
|
// https://arxiv.org/pdf/2106.08265
|
||||||
|
30
sources.bib
30
sources.bib
@ -197,3 +197,33 @@
|
|||||||
primaryClass={cs.CV},
|
primaryClass={cs.CV},
|
||||||
url={https://arxiv.org/abs/2101.00562},
|
url={https://arxiv.org/abs/2101.00562},
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@book{analysisrudin,
|
||||||
|
title = {Principles of mathematical analysis},
|
||||||
|
author = {Walter Rudin},
|
||||||
|
isbn = {},
|
||||||
|
series = {Mathermatics Series},
|
||||||
|
year = {1976},
|
||||||
|
publisher = {Mc Graw Hill},
|
||||||
|
keywords = {mathematics}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@book{dataminingbook,
|
||||||
|
title = {Data Mining: Concepts and Techniques},
|
||||||
|
author = {Jiawei Han, Micheline Kamber, Jian Pei},
|
||||||
|
isbn = {},
|
||||||
|
series = {The Morgan Kaufmann Series in Data Management Systems},
|
||||||
|
year = {2012},
|
||||||
|
publisher = {Morgran Kaufmann},
|
||||||
|
keywords = {mathematics}
|
||||||
|
}
|
||||||
|
|
||||||
|
@book{Goodfellow-et-al-2016,
|
||||||
|
title={Deep Learning},
|
||||||
|
author={Ian Goodfellow and Yoshua Bengio and Aaron Courville},
|
||||||
|
publisher={MIT Press},
|
||||||
|
note={\url{http://www.deeplearningbook.org}},
|
||||||
|
year={2016}
|
||||||
|
}
|
||||||
|
Loading…
x
Reference in New Issue
Block a user