fix lots of typos
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 1m9s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 1m9s
This commit is contained in:
@ -17,27 +17,27 @@ For all of the three methods we test the following use-cases:
|
||||
- Inbalanced 2 Way classification (5,10,15,30 good shots, 5 bad shots)
|
||||
- Similar to the 2 way classification but with an inbalanced number of good shots.
|
||||
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)#todo[Avoid bullet points and write flow text?]
|
||||
- Detect only the faulty classes without the good classed with an inbalanced number of shots.
|
||||
- Detect only the faulty classes without the good ones, but with an inbalanced number of shots.
|
||||
|
||||
All those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
|
||||
|
||||
== Experiment Setup
|
||||
All the experiments were done on the bottle and cable classes of the MVTEC AD dataset.
|
||||
The correspoinding number of shots were randomly selected from the dataset.
|
||||
The corresponding number of shots were randomly selected from the dataset.
|
||||
The rest of the images was used to test the model and measure the accuracy.
|
||||
#todo[Maybe add real number of samples per classes]
|
||||
|
||||
== ResNet50 <resnet50impl>
|
||||
=== Approach
|
||||
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
|
||||
The simplest approach is to use a pretrained ResNet50 model as a feature extractor.
|
||||
From both the support and query set the features are extracted to get a downprojected representation of the images.
|
||||
After downprojection the support set embeddings are compared to the query set embeddings.
|
||||
To predict the class of a query, the class with the smallest distance to the support embedding is chosen.
|
||||
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
|
||||
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
|
||||
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just use a Library of Pre-trained Feature
|
||||
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
|
||||
|
||||
In this bachelor thesis a pre-trained ResNet50 (IMAGENET1K_V2) pytorch model was used.
|
||||
In this bachelor thesis a pretrained ResNet50 (IMAGENET1K_V2) pytorch model was used.
|
||||
It is pretrained on the imagenet dataset and has 50 residual layers.
|
||||
|
||||
To get the embeddings the last layer of the model was removed and the output of the second last layer was used as embedding output.
|
||||
@ -95,7 +95,7 @@ The class with the smallest distance is chosen as the predicted class.
|
||||
This method performed better than expected with such a simple method.
|
||||
As in @resnet50bottleperfa with a normal 5 shot / 4 way classification the model achieved an accuracy of 75%.
|
||||
When detecting if there occured an anomaly or not only the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
|
||||
Interestintly the model performed slightly better with fewer shots in this case.
|
||||
Interestingly the model performed slightly better with fewer shots in this case.
|
||||
Moreover in @resnet50bottleperfa, the detection of the anomaly class only (3 way) shows a similar pattern as the normal 4 way classification.
|
||||
The more shots the better the performance and it peaks at around 88% accuracy with 5 shots.
|
||||
|
||||
@ -137,7 +137,7 @@ but this is expected as the cable class consists of 8 faulty classes.
|
||||
== P>M>F
|
||||
=== Approach
|
||||
For P>M>F, I used the pretrained model weights from the original paper.
|
||||
As backbone feature extractor a DINO model is used, which is pre-trained by facebook.
|
||||
As backbone feature extractor a DINO model is used, which is pretrained by facebook.
|
||||
This is a vision transformer with a patch size of 16 and 12 attention heads learned in a self-supervised fashion.
|
||||
This feature extractor was meta-trained with 10 public image dasets #footnote[ImageNet-1k, Omniglot, FGVC-
|
||||
Aircraft, CUB-200-2011, Describable Textures, QuickDraw,
|
||||
@ -145,7 +145,7 @@ FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~@pmfpaper]
|
||||
of diverse domains by the authors of the original paper.~@pmfpaper
|
||||
|
||||
Finally, this model is fine-tuned with the support set of every test iteration.
|
||||
Every time the support set changes, we need to finetune the model again.
|
||||
Every time the support set changes, we need to fine-tune the model again.
|
||||
In a real world scenario this should not be the case because the support set is fixed and only the query set changes.
|
||||
|
||||
=== Results
|
||||
|
Reference in New Issue
Block a user