add several sources and fix some errors in text
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 29s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 29s
This commit is contained in:
@ -31,8 +31,8 @@ The rest of the images was used to test the model and measure the accuracy.
|
||||
=== Approach
|
||||
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
|
||||
From both the support and query set the features are extracted to get a downprojected representation of the images.
|
||||
The support set embeddings are compared to the query set embeddings.
|
||||
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
|
||||
After downprojection the support set embeddings are compared to the query set embeddings.
|
||||
To predict the class of a query, the class with the smallest distance to the support embedding is chosen.
|
||||
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
|
||||
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
|
||||
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
|
||||
@ -94,13 +94,13 @@ The class with the smallest distance is chosen as the predicted class.
|
||||
=== Results <resnet50perf>
|
||||
This method performed better than expected wich such a simple method.
|
||||
As in @resnet50bottleperfa with a normal 5 shot / 4 way classification the model achieved an accuracy of 75%.
|
||||
When detecting only if there occured an anomaly or not the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
|
||||
When detecting if there occured an anomaly or not only the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
|
||||
Interestintly the model performed slightly better with fewer shots in this case.
|
||||
Moreover in @resnet50bottleperfa, the detection of the anomaly class only (3 way) shows a similar pattern as the normal 4 way classification.
|
||||
The more shots the better the performance and it peaks at around 88% accuracy with 5 shots.
|
||||
|
||||
In @resnet50bottleperfb the model was tested with inbalanced class distributions.
|
||||
With [5,10,15,30] good shots and 5 bad shots the model performed worse than with balanced classes.
|
||||
With {5, 10, 15, 30} good shots and 5 bad shots the model performed worse than with balanced classes.
|
||||
The more good shots the worse the performance.
|
||||
The only exception is the faulty or not detection (2 way) where the model peaked at 15 good shots with 83% accuracy.
|
||||
|
||||
@ -136,13 +136,13 @@ but this is expected as the cable class consists of 8 faulty classes.
|
||||
|
||||
== P>M>F
|
||||
=== Approach
|
||||
For P>M>F the pretrained model weights from the original paper were used.
|
||||
For P>M>F I used the pretrained model weights from the original paper.
|
||||
As backbone feature extractor a DINO model is used, which is pre-trained by facebook.
|
||||
This is a vision transformer with a patch size of 16 and 12 attention heads learned in a self-supervised fashion.
|
||||
This feature extractor was meta-trained with 10 public image dasets #footnote[ImageNet-1k, Omniglot, FGVC-
|
||||
Aircraft, CUB-200-2011, Describable Textures, QuickDraw,
|
||||
FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~#cite(<pmfpaper>)]
|
||||
of diverse domains by the authors of the original paper.#cite(<pmfpaper>)
|
||||
FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~@pmfpaper]
|
||||
of diverse domains by the authors of the original paper.~@pmfpaper
|
||||
|
||||
Finally, this model is finetuned with the support set of every test iteration.
|
||||
Everytime the support set changes we need to finetune the model again.
|
||||
@ -182,7 +182,7 @@ So it is clearly a bad idea to add more good shots to the support set.
|
||||
|
||||
== CAML
|
||||
=== Approach
|
||||
For the CAML implementation the pretrained model weights from the original paper were used.
|
||||
For the CAML implementation I used the pretrained model weights from the original paper.
|
||||
The non-causal sequence model (transformer) is pretrained with every class having the same number of shots.
|
||||
This brings the limitation that it can only process default few-shot learning tasks in the n-way k-shots fashion.
|
||||
Since it expects the input sequence to be distributed with the same number of shots per class.
|
||||
@ -190,7 +190,7 @@ This is the reason why for this method the two imbalanced test cases couldn't be
|
||||
|
||||
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
|
||||
This feature extractor was already pretrained when used by the authors of the original paper.
|
||||
For the non-causal sequence model a transformer model was used
|
||||
In this case for the non-causal sequence model a transformer model was used.
|
||||
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
|
||||
This transformer was trained on a huge number of images as described in @CAML.
|
||||
|
||||
@ -198,7 +198,8 @@ This transformer was trained on a huge number of images as described in @CAML.
|
||||
The results were not as good as expeced.
|
||||
This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain.
|
||||
The model was trained on a large number of general purpose images and is not fine-tuned at all.
|
||||
It might not handle very similar images well.
|
||||
Moreover, it was not fine-tuned on the support set similar to the P>M>F method, which could have a huge impact on performance.
|
||||
It might also not handle very similar images well.
|
||||
|
||||
Compared the the other two methods CAML performed poorly in almost all experiments.
|
||||
The normal few-shot classification reached only 40% accuracy in @camlperfa at best.
|
||||
|
Reference in New Issue
Block a user