Compare commits

..

1 Commits

Author SHA1 Message Date
71bdb0a207 fix some errors
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 1m34s
2025-01-24 19:51:55 +01:00
5 changed files with 28 additions and 61 deletions

View File

@ -64,8 +64,8 @@ Which is an result that is unexpected (since one can think more samples perform
Clearly all four graphs show that the performance decreases with an increasing number of good samples.
So the conclusion is that the Few-Shot learner should always be trained with as balanced classes as possible.
== How does the 3 (ResNet, CAML, P>M>F) methods perform in only detecting the anomaly class?
_How much does the performance improve if only detecting an anomaly or not?
== How do the 3 (ResNet, CAML, P>M>F) methods perform in only detecting the anomaly class?
_How much does the performance improve by only detecting the presence of an anomaly?
How does it compare to PatchCore and EfficientAD#todo[Maybe remove comparion?]?_
@comparisonnormal shows graphs comparing the performance of the ResNet, CAML and P>M>F methods in detecting the anomaly class only including the good class as well as excluding the good class.
@ -101,7 +101,7 @@ One could use a well established algorithm like PatchCore or EfficientAD for det
label: <comparisonnormal>,
)
/*#if inwriting [
#if inwriting [
== Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?
#todo[Maybe don't do this]
]*/
]

View File

@ -31,8 +31,8 @@ The rest of the images was used to test the model and measure the accuracy.
=== Approach
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
From both the support and query set the features are extracted to get a downprojected representation of the images.
After downprojection the support set embeddings are compared to the query set embeddings.
To predict the class of a query, the class with the smallest distance to the support embedding is chosen.
The support set embeddings are compared to the query set embeddings.
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
@ -94,13 +94,13 @@ The class with the smallest distance is chosen as the predicted class.
=== Results <resnet50perf>
This method performed better than expected wich such a simple method.
As in @resnet50bottleperfa with a normal 5 shot / 4 way classification the model achieved an accuracy of 75%.
When detecting if there occured an anomaly or not only the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
When detecting only if there occured an anomaly or not the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
Interestintly the model performed slightly better with fewer shots in this case.
Moreover in @resnet50bottleperfa, the detection of the anomaly class only (3 way) shows a similar pattern as the normal 4 way classification.
The more shots the better the performance and it peaks at around 88% accuracy with 5 shots.
In @resnet50bottleperfb the model was tested with inbalanced class distributions.
With {5, 10, 15, 30} good shots and 5 bad shots the model performed worse than with balanced classes.
With [5,10,15,30] good shots and 5 bad shots the model performed worse than with balanced classes.
The more good shots the worse the performance.
The only exception is the faulty or not detection (2 way) where the model peaked at 15 good shots with 83% accuracy.
@ -136,13 +136,13 @@ but this is expected as the cable class consists of 8 faulty classes.
== P>M>F
=== Approach
For P>M>F I used the pretrained model weights from the original paper.
For P>M>F the pretrained model weights from the original paper were used.
As backbone feature extractor a DINO model is used, which is pre-trained by facebook.
This is a vision transformer with a patch size of 16 and 12 attention heads learned in a self-supervised fashion.
This feature extractor was meta-trained with 10 public image dasets #footnote[ImageNet-1k, Omniglot, FGVC-
Aircraft, CUB-200-2011, Describable Textures, QuickDraw,
FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~@pmfpaper]
of diverse domains by the authors of the original paper.~@pmfpaper
FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~#cite(<pmfpaper>)]
of diverse domains by the authors of the original paper.#cite(<pmfpaper>)
Finally, this model is finetuned with the support set of every test iteration.
Everytime the support set changes we need to finetune the model again.
@ -182,7 +182,7 @@ So it is clearly a bad idea to add more good shots to the support set.
== CAML
=== Approach
For the CAML implementation I used the pretrained model weights from the original paper.
For the CAML implementation the pretrained model weights from the original paper were used.
The non-causal sequence model (transformer) is pretrained with every class having the same number of shots.
This brings the limitation that it can only process default few-shot learning tasks in the n-way k-shots fashion.
Since it expects the input sequence to be distributed with the same number of shots per class.
@ -190,7 +190,7 @@ This is the reason why for this method the two imbalanced test cases couldn't be
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
This feature extractor was already pretrained when used by the authors of the original paper.
In this case for the non-causal sequence model a transformer model was used.
For the non-causal sequence model a transformer model was used
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
This transformer was trained on a huge number of images as described in @CAML.
@ -198,8 +198,7 @@ This transformer was trained on a huge number of images as described in @CAML.
The results were not as good as expeced.
This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain.
The model was trained on a large number of general purpose images and is not fine-tuned at all.
Moreover, it was not fine-tuned on the support set similar to the P>M>F method, which could have a huge impact on performance.
It might also not handle very similar images well.
It might not handle very similar images well.
Compared the the other two methods CAML performed poorly in almost all experiments.
The normal few-shot classification reached only 40% accuracy in @camlperfa at best.

View File

@ -6,7 +6,7 @@ Anomaly detection has especially in the industrial and automotive field essentia
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
Machine learning helped the field to advance a lot in the past.
Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available.
So the train data is heavily unbalaned.~#cite(<parnami2022learningexamplessummaryapproaches>)
So the train data is heavily unbalanced.~#cite(<parnami2022learningexamplessummaryapproaches>)
PatchCore and EfficientAD are state of the art algorithms trained only on good data and then detect anomalies within unseen (but similar) data.
One of their problems is the need of lots of training data and time to train.
@ -25,14 +25,14 @@ How does it compare to well established algorithms such as Patchcore or Efficien
=== How does disbalancing the Shot number affect performance?
_Does giving the Few-Shot learner more good than bad samples improve the model performance?_
=== How does the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?
_How much does the performance improve if only detecting an anomaly or not?
=== How do the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?
_How much does the performance improve by only detecting the presence of an anomaly?
How does it compare to PatchCore and EfficientAD?_
/*#if inwriting [
#if inwriting [
=== _Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?_
// I've tried different distance measures $->$ but results are pretty much the same.
]*/
]
== Outline
This thesis is structured to provide a comprehensive exploration of Few-Shot Learning in anomaly detection.

View File

@ -36,7 +36,7 @@ The bottle category contains 3 different defect classes: _broken_large_, _broken
Whereas cable has a lot more defect classes: _bent_wire_, _cable_swap_, _combined_, _cut_inner_insulation_,
_cut_outer_insulation_, _missing_cable_, _missing_wire_, _poke_insulation_.
So many more defect classes are already an indication that a classification task might be more difficult for the cable category.
More defect classes are already an indication that a classification task might be more difficult for the cable category.
#subpar.grid(
figure(image("rsc/mvtec/cable/bent_wire_example.png"), caption: [
@ -79,7 +79,7 @@ So the model is prone to overfitting to the few training samples and this means
Typically a few-shot leaning task consists of a support and query set.
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
A common way to format a few-shot leaning problem is using n-way k-shot notation.
For Example 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper
For Example, 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper
A classical example of how such a model might work is a prototypical network.
These models learn a representation of each class in a reduced dimensionality and classify new examples based on proximity to these representations in an embedding space.~@snell2017prototypicalnetworksfewshotlearning
@ -98,8 +98,7 @@ See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust
An especially hard task is to generalize from such few samples.
In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
This helps the model to learn the underlying patterns and to generalize well to unseen data.
In few-shot learning the model has to generalize from just a few samples.#todo[Write more about. eg. class distributions]
@Goodfellow-et-al-2016
In few-shot learning the model has to generalize from just a few samples.#todo[Source?]#todo[Write more about. eg. class distributions]
=== Softmax
#todo[Maybe remove this section]
@ -128,27 +127,26 @@ $ <crel>
Equation~$cal(L)(p,q)$ @crelbatched #cite(<handsonaiI>) is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
=== Cosine Similarity
To measure the distance between two vectors some common distance measures are used.
One popular of them is the Cosine Similarity (@cosinesimilarity).
It measures the cosine of the angle between two vectors.
The Cosine Similarity is especially useful when the magnitude of the vectors is not important.
@dataminingbook@analysisrudin
Cosine similarity is a widely used metric for measuring the similarity between two vectors. (@cosinesimilarity).
It computes the cosine of the angle between the vectors, offering a measure of their alignment.
This property makes the cosine similarity particularly effective in scenarios where the
direction of the vector holds more important information than the magnitude.
$
cos(theta) &:= (A dot B) / (||A|| dot ||B||)\
&= (sum_(i=1)^n A_i B_i)/ (sqrt(sum_(i=1)^n A_i^2) dot sqrt(sum_(i=1)^n B_i^2))
$ <cosinesimilarity>
#todo[Source?]
=== Euclidean Distance
The euclidean distance (@euclideannorm) is a simpler method to measure the distance between two points in a vector space.
It just calculates the square root of the sum of the squared differences of the coordinates.
the euclidean distance can also be represented as the L2 norm (euclidean norm) of the difference of the two vectors.
@analysisrudin
$
cal(d)(A,B) = ||A-B|| := sqrt(sum_(i=1)^n (A_i - B_i)^2)
$ <euclideannorm>
#todo[Source?]
=== Patchcore
// https://arxiv.org/pdf/2106.08265

View File

@ -197,33 +197,3 @@
primaryClass={cs.CV},
url={https://arxiv.org/abs/2101.00562},
}
@book{analysisrudin,
title = {Principles of mathematical analysis},
author = {Walter Rudin},
isbn = {},
series = {Mathermatics Series},
year = {1976},
publisher = {Mc Graw Hill},
keywords = {mathematics}
}
@book{dataminingbook,
title = {Data Mining: Concepts and Techniques},
author = {Jiawei Han, Micheline Kamber, Jian Pei},
isbn = {},
series = {The Morgan Kaufmann Series in Data Management Systems},
year = {2012},
publisher = {Morgran Kaufmann},
keywords = {mathematics}
}
@book{Goodfellow-et-al-2016,
title={Deep Learning},
author={Ian Goodfellow and Yoshua Bengio and Aaron Courville},
publisher={MIT Press},
note={\url{http://www.deeplearningbook.org}},
year={2016}
}