fix lots of typos
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 1m9s

This commit is contained in:
2025-02-02 12:59:15 +01:00
parent 94fe252741
commit cf6f4f96ac
5 changed files with 53 additions and 53 deletions

View File

@ -17,27 +17,27 @@ For all of the three methods we test the following use-cases:
- Inbalanced 2 Way classification (5,10,15,30 good shots, 5 bad shots)
- Similar to the 2 way classification but with an inbalanced number of good shots.
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)#todo[Avoid bullet points and write flow text?]
- Detect only the faulty classes without the good classed with an inbalanced number of shots.
- Detect only the faulty classes without the good ones, but with an inbalanced number of shots.
All those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
== Experiment Setup
All the experiments were done on the bottle and cable classes of the MVTEC AD dataset.
The correspoinding number of shots were randomly selected from the dataset.
The corresponding number of shots were randomly selected from the dataset.
The rest of the images was used to test the model and measure the accuracy.
#todo[Maybe add real number of samples per classes]
== ResNet50 <resnet50impl>
=== Approach
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
The simplest approach is to use a pretrained ResNet50 model as a feature extractor.
From both the support and query set the features are extracted to get a downprojected representation of the images.
After downprojection the support set embeddings are compared to the query set embeddings.
To predict the class of a query, the class with the smallest distance to the support embedding is chosen.
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just use a Library of Pre-trained Feature
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
In this bachelor thesis a pre-trained ResNet50 (IMAGENET1K_V2) pytorch model was used.
In this bachelor thesis a pretrained ResNet50 (IMAGENET1K_V2) pytorch model was used.
It is pretrained on the imagenet dataset and has 50 residual layers.
To get the embeddings the last layer of the model was removed and the output of the second last layer was used as embedding output.
@ -95,7 +95,7 @@ The class with the smallest distance is chosen as the predicted class.
This method performed better than expected with such a simple method.
As in @resnet50bottleperfa with a normal 5 shot / 4 way classification the model achieved an accuracy of 75%.
When detecting if there occured an anomaly or not only the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
Interestintly the model performed slightly better with fewer shots in this case.
Interestingly the model performed slightly better with fewer shots in this case.
Moreover in @resnet50bottleperfa, the detection of the anomaly class only (3 way) shows a similar pattern as the normal 4 way classification.
The more shots the better the performance and it peaks at around 88% accuracy with 5 shots.
@ -137,7 +137,7 @@ but this is expected as the cable class consists of 8 faulty classes.
== P>M>F
=== Approach
For P>M>F, I used the pretrained model weights from the original paper.
As backbone feature extractor a DINO model is used, which is pre-trained by facebook.
As backbone feature extractor a DINO model is used, which is pretrained by facebook.
This is a vision transformer with a patch size of 16 and 12 attention heads learned in a self-supervised fashion.
This feature extractor was meta-trained with 10 public image dasets #footnote[ImageNet-1k, Omniglot, FGVC-
Aircraft, CUB-200-2011, Describable Textures, QuickDraw,
@ -145,7 +145,7 @@ FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~@pmfpaper]
of diverse domains by the authors of the original paper.~@pmfpaper
Finally, this model is fine-tuned with the support set of every test iteration.
Every time the support set changes, we need to finetune the model again.
Every time the support set changes, we need to fine-tune the model again.
In a real world scenario this should not be the case because the support set is fixed and only the query set changes.
=== Results