Compare commits

...

40 Commits

Author SHA1 Message Date
71bdb0a207 fix some errors
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 1m34s
2025-01-24 19:51:55 +01:00
8f28a8c387 use ieee citation style
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 18s
2025-01-15 16:02:26 +01:00
a1b8d7d81a update supervisor title
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 10s
2025-01-15 06:35:17 +00:00
c5bd509f24 fix some todos and spelling errors
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 21s
2025-01-15 07:03:10 +01:00
30d09a67d2 fix caml stuff and add things to last sec
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 11s
2025-01-14 20:05:11 +01:00
3e440e97f7 add stuff why inbalanced doesn't work for caml
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 13s
2025-01-14 19:39:41 +01:00
49d5e97417 add abstract, finish the alternatvie methods and fix some todos and improve sources
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 21s
2025-01-14 19:22:15 +01:00
7c54e11238 add sgva clip to not used materials
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 13s
2025-01-13 22:36:44 +01:00
dd1f28a89f add some things to matmethods
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 22s
2025-01-13 15:09:53 +01:00
7b5be51446 add some things to matmethods 2025-01-13 15:09:43 +01:00
ebda5246b5 change position of ref
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 19s
2025-01-10 13:14:09 +01:00
96e0dcc87c add papers to use for alternative methods
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 18s
2025-01-08 07:45:16 +00:00
1a5dc337f7 remove extra from production build
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 12s
2025-01-07 18:12:08 +01:00
829f7a5c5b add outline
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 10s
2025-01-07 18:04:04 +01:00
9f133fead5 add notebooks with plots
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 12s
2025-01-07 17:40:35 +01:00
a7b9fdb998 finish results section and add most of conclusion and outlook stuff
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 12s
2025-01-07 17:40:11 +01:00
34215fb720 update notebooks
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 29s
2025-01-07 15:45:02 +01:00
9c70fdf932 add info line why pc and eid not in plot
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 18s
2025-01-05 18:02:45 +01:00
2eeed2c31e add images for final results
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 12s
2025-01-03 21:48:48 +01:00
2690a3d0f2 add pmf material section
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 18s
2025-01-03 15:25:32 +01:00
882c6f54bb ploting notebooks
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 22s
2025-01-01 20:53:05 +01:00
fe9f4433b3 add result images for each method
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 1m1s
2025-01-01 20:50:52 +01:00
24118dce93 add stuff for CAML
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 15s
2024-12-31 12:23:53 +01:00
155faa6e80 correct eq numbering, add impl of resent50
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 11s
2024-12-30 18:34:43 +01:00
0b5a0647e2 add similarities and finish parts of matandmeth
Some checks failed
Build Typst document / build_typst_documents (push) Failing after 7s
2024-12-30 14:06:47 +01:00
ac4f4d78cb kind of finish caml general infos
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 8s
2024-12-30 11:24:29 +01:00
9cd678aa70 add more stuff to caml
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 16s
2024-12-30 10:32:03 +01:00
1805bc2d78 add stuff for CAML
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 27s
2024-12-21 18:42:59 +01:00
a358401ffb add new sections and some todos
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 11s
2024-12-20 12:33:54 +01:00
58427cd595 correctly seperate build args
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 10s
2024-12-20 12:05:46 +01:00
e7ba59d0c9 correctly seperate build args
Some checks failed
Build Typst document / build_typst_documents (push) Failing after 5s
2024-12-20 11:59:05 +01:00
53d11ae459 on git build release mode add some todos
Some checks failed
Build Typst document / build_typst_documents (push) Failing after 10s
2024-12-20 11:52:51 +01:00
1b41fff04b add link to caml
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 10s
2024-12-19 16:53:50 +01:00
9386971006 add efficientad section
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 50s
2024-12-19 15:24:36 +01:00
18025d10c5 add patchcore overview
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 31s
2024-12-09 16:20:48 +01:00
a3ba4cc30b improve intro
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 3m26s
2024-11-29 16:18:04 +01:00
22e1edf077 add mvtec example imgs
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 41s
2024-11-11 14:30:21 +01:00
bb8436339a fix workflow and rm pdf
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 10s
2024-11-04 15:12:53 +01:00
bcbb9bb9de move typst to root and delte latex
Some checks failed
Build Typst document / build_typst_documents (push) Failing after 8s
2024-11-04 15:11:44 +01:00
7ef0bb21b2 change instutute to correct one 2024-11-04 15:09:47 +01:00
88 changed files with 2695 additions and 1854 deletions

View File

@ -1,32 +0,0 @@
name: Build LaTeX Document
on:
workflow_dispatch:
jobs:
build:
runs-on: ubuntu-latest
steps:
# Checkout the repository containing the LaTeX files
- name: Checkout repository
uses: actions/checkout@v3
# Install LaTeX dependencies manually (TexLive and BibTeX)
- name: Install LaTeX
run: |
sudo apt-get update
sudo apt-get install -y texlive-full biber latexmk
# Compile the LaTeX document (first pass)
- name: Compile LaTeX (first pass)
run: |
cd src
latexmk -pdf -bibtex -interaction=nonstopmode main.tex
# Upload the compiled PDF as an artifact
- name: Upload PDF
uses: actions/upload-artifact@v3
with:
name: compiled-latex
path: src/main.pdf

View File

@ -8,11 +8,16 @@ jobs:
- name: Checkout
uses: actions/checkout@v3
- name: Typst
uses: lvignoli/typst-action@main
uses: leana8959/typst-action@main
with:
source_file: typstalt/main.typ
source_file: main.typ
options: |
--input
inwriting=false
--input
draft=false
- name: Upload PDF file
uses: actions/upload-artifact@v3
with:
name: PDF
path: typstalt/main.pdf
path: main.pdf

22
conclusionandoutlook.typ Normal file
View File

@ -0,0 +1,22 @@
= Conclusion and Outlook <sectionconclusionandoutlook>
== Conclusion
In conclusion one can say that Few-Shot learning is not the best choice for anomaly detection tasks.
It is hugely outperformed by state of the art algorithms like Patchcore or EfficientAD.
The only benefit of Few-Shot learning is that it can be used in environments where only a limited number of good samples are available.
But this should not be the case in most scenarios.
Most of the time plenty of good samples are available and in this case Patchcore or EfficientAD should perform great.
The only case where Few-Shot learning could be used is in a scenarios where one wants to detect the anomaly class itself.
Patchcore and EfficientAD can only detect if an anomaly is present or not but not what type of anomaly it actually is.
So chaining a Few-Shot learner after Patchcore or EfficientAD could be a good idea to use the best of both worlds.
In most of the tests P>M>F performed the best.
But also the simple ResNet50 method performed better than expected in most cases and can be considered if the computational resources are limited and if a simple architecture is enough.
== Outlook
In the future when new Few-Shot learning methods evolve it could be interesting to test again how they perform in anomaly detection tasks.
There might be a lack of research in the area where the classes to detect are very similar to each other
and when building a few-shot learning algorithm tailored specifically for very similar classes this could boost the performance by a large margin.
It might be interesting to test the SOT method (see @SOT) with a ResNet50 feature extractor similar as proposed in this thesis but with SOT for embedding comparison.
Moreover, TRIDENT (see @TRIDENT) could achive promising results in a anomaly detection scenario.

107
experimentalresults.typ Normal file
View File

@ -0,0 +1,107 @@
#import "utils.typ": todo, inwriting
#import "@preview/subpar:0.1.1"
= Experimental Results <sectionexperimentalresults>
== Is Few-Shot learning a suitable fit for anomaly detection? <expresults2way>
_Should Few-Shot learning be used for anomaly detection tasks?
How does it compare to well established algorithms such as Patchcore or EfficientAD?_
@comparison2waybottle shows the performance of the 2-way classification (anomaly or not) on the bottle class and @comparison2waycable the same on the cable class.
The performance values are the same as in @experiments but just merged together into one graph.
As a reference Patchcore reaches an AUROC score of 99.6% and EfficientAD reaches 99.8% averaged over all classes provided by the MVTec AD dataset.
Both are trained with samples from the 'good' class only.
So there is a clear performance gap between Few-Shot learning and the state of the art anomaly detection algorithms.
In the @comparison2way Patchcore and EfficientAD are not included as they aren't directly compareable in the same fashion.
That means if the goal is just to detect anomalies, Few-Shot learning is not the best choice and Patchcore or EfficientAD should be used.
#subpar.grid(
figure(image("rsc/comparison-2way-bottle.png"), caption: [
Bottle class
]), <comparison2waybottle>,
figure(image("rsc/comparison-2way-cable.png"), caption: [
Cable class
]), <comparison2waycable>,
columns: (1fr, 1fr),
caption: [2-Way classification performance],
label: <comparison2way>,
)
== How does disbalancing the Shot number affect performance?
_Does giving the Few-Shot learner more good than bad samples improve the model performance?_
As all three method results in @experiments show, the performance of the Few-Shot learner decreases with an increasing number of good samples.
Which is an result that is unexpected (since one can think more samples perform always better) but align with the idea that all classes should always be as balanced as possible.
@comparisoninbalanced shows the performance of the inbalanced classification on the bottle and cable class for all anomaly classes.
#subpar.grid(
figure(image("rsc/inbalanced-bottle.png"), caption: [
Bottle class
]), <comparisoninbalancedbottle>,
figure(image("rsc/inbalanced-cable.png"), caption: [
Cable class
]), <comparisoninbalancedcable>,
columns: (1fr, 1fr),
caption: [Inbalanced classification performance],
label: <comparisoninbalanced>,
)
@comparisoninbalanced2way shows the performance of the inbalanced classification for just being an anomaly or not on the bottle and cable class.
#subpar.grid(
figure(image("rsc/inbalanced-2way-bottle.png"), caption: [
Bottle class
]), <comparisoninbalanced2waybottle>,
figure(image("rsc/inbalanced-2way-cable.png"), caption: [
Cable class
]), <comparisoninbalanced2waycable>,
columns: (1fr, 1fr),
caption: [Inbalanced 2-Way classification performance],
label: <comparisoninbalanced2way>,
)
Clearly all four graphs show that the performance decreases with an increasing number of good samples.
So the conclusion is that the Few-Shot learner should always be trained with as balanced classes as possible.
== How do the 3 (ResNet, CAML, P>M>F) methods perform in only detecting the anomaly class?
_How much does the performance improve by only detecting the presence of an anomaly?
How does it compare to PatchCore and EfficientAD#todo[Maybe remove comparion?]?_
@comparisonnormal shows graphs comparing the performance of the ResNet, CAML and P>M>F methods in detecting the anomaly class only including the good class as well as excluding the good class.
P>M>F performs in almost all cases better than ResNet and CAML.
P>M>F reaches up to 78% accuracy in the bottle class (@comparisonnormalbottle) and 46% in the cable class (@comparisonnormalcable) when detecting all classes including good ones
and 84% in the bottle class (@comparisonfaultyonlybottle) and 51% in the cable class (@comparisonfaultyonlycable) when excluding the good class.
Those results are pretty good when considering the few amount of samples and how similar the anomaly classes actually are.
CAML performes the worst in all cases except for the cable class when detecting all classes except the good one.
This might be the case because it is not fine-tuned on the shots and not really built for such similar classes.
The detection is not really compareable with PatchCore and EfficientAD as they are trained on the good class only.
And they are built for detecting anomalies in general and not the anomaly classes.
Have a look at @expresults2way for a comparison of the 2-way classification performance.
So in conclusion it's a good idea to use P>M>F for detecting the anomaly classes only.
Especially when there are not many samples of the anomaly classes available such as in most anomaly detection scenarios.
One could use a well established algorithm like PatchCore or EfficientAD for detecting anomalies in general and P>M>F for detecting the anomaly class afterwards.
#subpar.grid(
figure(image("rsc/normal-bottle.png"), caption: [
5-Way - Bottle class
]), <comparisonnormalbottle>,
figure(image("rsc/normal-cable.png"), caption: [
9-Way - Cable class
]), <comparisonnormalcable>,
figure(image("rsc/faultclasses-bottle.png"), caption: [
4-Way - Bottle class
]), <comparisonfaultyonlybottle>,
figure(image("rsc/faultclasses-cable.png"), caption: [
8-Way - Cable class
]), <comparisonfaultyonlycable>,
columns: (1fr, 1fr),
caption: [Nomaly class only classification performance],
label: <comparisonnormal>,
)
#if inwriting [
== Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?
#todo[Maybe don't do this]
]

217
implementation.typ Normal file
View File

@ -0,0 +1,217 @@
#import "@preview/fletcher:0.5.3" as fletcher: diagram, node, edge
#import fletcher.shapes: rect, diamond
#import "utils.typ": todo
#import "@preview/subpar:0.1.1"
= Implementation <sectionimplementation>
The three methods described (ResNet50, CAML, P>M>F) were implemented in a Jupyter notebook and compared to each other.
== Experiments <experiments>
For all of the three methods we test the following use-cases:
- Detection of anomaly class (1,3,5 shots)
- Every faulty class and the good class is detected.
- 2 Way classification (1,3,5 shots)
- Only faulty or not faulty is detected. All the samples of the faulty classes are treated as a single class.
- Detect only anomaly classes (1,3,5 shots)
- Similar to the first test but without the good class. Only faulty classes are detected.
- Inbalanced 2 Way classification (5,10,15,30 good shots, 5 bad shots)
- Similar to the 2 way classification but with an inbalanced number of good shots.
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)#todo[Avoid bullet points and write flow text?]
- Detect only the faulty classes without the good classed with an inbalanced number of shots.
All those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
== Experiment Setup
All the experiments were done on the bottle and cable classes of the MVTEC AD dataset.
The correspoinding number of shots were randomly selected from the dataset.
The rest of the images was used to test the model and measure the accuracy.
#todo[Maybe add real number of samples per classes]
== ResNet50 <resnet50impl>
=== Approach
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
From both the support and query set the features are extracted to get a downprojected representation of the images.
The support set embeddings are compared to the query set embeddings.
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
In this bachelor thesis a pre-trained ResNet50 (IMAGENET1K_V2) pytorch model was used.
It is pretrained on the imagenet dataset and has 50 residual layers.
To get the embeddings the last layer of the model was removed and the output of the second last layer was used as embedding output.
In the following diagram the ResNet50 architecture is visualized and the cut-point is marked.~@chowdhury2021fewshotimageclassificationjust
#diagram(
spacing: (5mm, 5mm),
node-stroke: 1pt,
node-fill: eastern,
edge-stroke: 1pt,
// Input
node((1, 1), "Input", shape: rect, width: 30mm, height: 10mm, name: <input>),
// Conv1
node((1, 0), "Conv1\n7x7, 64", shape: rect, width: 30mm, height: 15mm, name: <conv1>),
edge(<input>, <conv1>, "->"),
// MaxPool
node((1, -1), "MaxPool\n3x3", shape: rect, width: 30mm, height: 15mm, name: <maxpool>),
edge(<conv1>, <maxpool>, "->"),
// Residual Blocks
node((3, -1), "Residual Block 1\n3x [64, 64, 256]", shape: rect, width: 40mm, height: 15mm, name: <res1>),
edge(<maxpool>, <res1>, "->"),
node((3, 0), "Residual Block 2\n4x [128, 128, 512]", shape: rect, width: 40mm, height: 15mm, name: <res2>),
edge(<res1>, <res2>, "->"),
node((3, 1), "Residual Block 3\n6x [256, 256, 1024]", shape: rect, width: 40mm, height: 15mm, name: <res3>),
edge(<res2>, <res3>, "->"),
node((3, 2), "Residual Block 4\n3x [512, 512, 2048]", shape: rect, width: 40mm, height: 15mm, name: <res4>),
edge(<res3>, <res4>, "->"),
// Cutting Line
edge(<res4>, <avgpool>, marks: "..|..>", stroke: 1pt, label: "Cut here", label-pos: 0.5, label-side: left),
// AvgPool + FC
node((7, 2), "AvgPool\n1x1", shape: rect, width: 30mm, height: 10mm, name: <avgpool>),
//edge(<res4>, <avgpool>, "->"),
node((7, 1), "Fully Connected\n1000 classes", shape: rect, width: 40mm, height: 10mm, name: <fc>),
edge(<avgpool>, <fc>, "->"),
// Output
node((7, 0), "Output", shape: rect, width: 30mm, height: 10mm, name: <output>),
edge(<fc>, <output>, "->")
)
After creating the embeddings for the support and query set the euclidean distance is calculated.
The class with the smallest distance is chosen as the predicted class.
=== Results <resnet50perf>
This method performed better than expected wich such a simple method.
As in @resnet50bottleperfa with a normal 5 shot / 4 way classification the model achieved an accuracy of 75%.
When detecting only if there occured an anomaly or not the performance is significantly better and peaks at 81% with 5 shots / 2 ways.
Interestintly the model performed slightly better with fewer shots in this case.
Moreover in @resnet50bottleperfa, the detection of the anomaly class only (3 way) shows a similar pattern as the normal 4 way classification.
The more shots the better the performance and it peaks at around 88% accuracy with 5 shots.
In @resnet50bottleperfb the model was tested with inbalanced class distributions.
With [5,10,15,30] good shots and 5 bad shots the model performed worse than with balanced classes.
The more good shots the worse the performance.
The only exception is the faulty or not detection (2 way) where the model peaked at 15 good shots with 83% accuracy.
#subpar.grid(
figure(image("rsc/resnet/ResNet50-bottle.png"), caption: [
Normal [1,3,5] shots
]), <resnet50bottleperfa>,
figure(image("rsc/resnet/ResNet50-bottle-inbalanced.png"), caption: [
Inbalanced [5,10,15,30] shots
]), <resnet50bottleperfb>,
columns: (1fr, 1fr),
caption: [ResNet50 performance on bottle class],
label: <resnet50bottleperf>,
)
The same experiments were conducted on the cable class and the results are shown in @resnet50cableperfa and @resnet50cableperfb.
The results are very similar to the bottle class.
Generally the more shots the better the accuracy.
But the overall reached max accuracy is lower than on the bottle class,
but this is expected as the cable class consists of 8 faulty classes.
#subpar.grid(
figure(image("rsc/resnet/ResNet50-cable.png"), caption: [
Normal [1,3,5] shots
]), <resnet50cableperfa>,
figure(image("rsc/resnet/ResNet50-cable-inbalanced.png"), caption: [
Inbalanced [5,10,15,30] shots
]), <resnet50cableperfb>,
columns: (1fr, 1fr),
caption: [ResNet50 performance on cable class],
label: <resnet50cableperf>,
)
== P>M>F
=== Approach
For P>M>F the pretrained model weights from the original paper were used.
As backbone feature extractor a DINO model is used, which is pre-trained by facebook.
This is a vision transformer with a patch size of 16 and 12 attention heads learned in a self-supervised fashion.
This feature extractor was meta-trained with 10 public image dasets #footnote[ImageNet-1k, Omniglot, FGVC-
Aircraft, CUB-200-2011, Describable Textures, QuickDraw,
FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~#cite(<pmfpaper>)]
of diverse domains by the authors of the original paper.#cite(<pmfpaper>)
Finally, this model is finetuned with the support set of every test iteration.
Everytime the support set changes we need to finetune the model again.
In a real world scenario this should not be the case because the support set is fixed and only the query set changes.
=== Results
The results of P>M>F look very promising and improve by a large margin over the ResNet50 method.
In @pmfbottleperfa the model reached an accuracy of 79% with 5 shots / 4 way classification.
The 2 way classification (faulty or not) performed even better and peaked at 94% accuracy with 5 shots.
Similar to the ResNet50 method in @resnet50perf the tests with an inbalanced class distribution performed worse than with balanced classes.
So it is clearly a bad idea to add more good shots to the support set.
#subpar.grid(
figure(image("rsc/pmf/P>M>F-bottle.png"), caption: [
Normal [1,3,5] shots
]), <pmfbottleperfa>,
figure(image("rsc/pmf/P>M>F-bottle-inbalanced.png"), caption: [
Inbalanced [5,10,15,30] shots
]), <pmfbottleperfb>,
columns: (1fr, 1fr),
caption: [P>M>F performance on bottle class],
label: <pmfbottleperf>,
)
#subpar.grid(
figure(image("rsc/pmf/P>M>F-cable.png"), caption: [
Normal [1,3,5] shots
]), <pmfcableperfa>,
figure(image("rsc/pmf/P>M>F-cable-inbalanced.png"), caption: [
Inbalanced [5,10,15,30] shots
]), <pmfcableperfb>,
columns: (1fr, 1fr),
caption: [P>M>F performance on cable class],
label: <pmfcableperf>,
)
== CAML
=== Approach
For the CAML implementation the pretrained model weights from the original paper were used.
The non-causal sequence model (transformer) is pretrained with every class having the same number of shots.
This brings the limitation that it can only process default few-shot learning tasks in the n-way k-shots fashion.
Since it expects the input sequence to be distributed with the same number of shots per class.
This is the reason why for this method the two imbalanced test cases couldn't be conducted.
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
This feature extractor was already pretrained when used by the authors of the original paper.
For the non-causal sequence model a transformer model was used
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
This transformer was trained on a huge number of images as described in @CAML.
=== Results
The results were not as good as expeced.
This might be caused by the fact that the model was not fine-tuned for any industrial dataset domain.
The model was trained on a large number of general purpose images and is not fine-tuned at all.
It might not handle very similar images well.
Compared the the other two methods CAML performed poorly in almost all experiments.
The normal few-shot classification reached only 40% accuracy in @camlperfa at best.
The only test it did surprisingly well was the detection of the anomaly class for the cable class in @camlperfb were it reached almost 60% accuracy.
#subpar.grid(
figure(image("rsc/caml/CAML-bottle.png"), caption: [
Normal [1,3,5] shots - Bottle
]), <camlperfa>,
figure(image("rsc/caml/CAML-cable.png"), caption: [
Normal [1,3,5] shots - Cable
]), <camlperfb>,
columns: (1fr, 1fr),
caption: [CAML performance],
label: <camlperf>,
)

51
introduction.typ Normal file
View File

@ -0,0 +1,51 @@
#import "utils.typ": todo, inwriting
= Introduction
== Motivation
Anomaly detection has especially in the industrial and automotive field essential importance.
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
Machine learning helped the field to advance a lot in the past.
Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available.
So the train data is heavily unbalanced.~#cite(<parnami2022learningexamplessummaryapproaches>)
PatchCore and EfficientAD are state of the art algorithms trained only on good data and then detect anomalies within unseen (but similar) data.
One of their problems is the need of lots of training data and time to train.
Moreover a slight change of the camera position or the lighting conditions can lead to a mandatory complete retraining of the model.
Few-Shot learning might be a suitable alternative with hugely lowered train times and fast adaption to new conditions.~#cite(<efficientADpaper>)#cite(<patchcorepaper>)#cite(<parnami2022learningexamplessummaryapproaches>)
In this thesis the performance of 3 Few-Shot learning algorithms (ResNet50, P>M>F, CAML) will be compared in the field of anomaly detection.
Moreover, few-shot learning might be able not only to detect anomalies but also to detect the anomaly class.
== Research Questions <sectionresearchquestions>
=== Is Few-Shot learning a suitable fit for anomaly detection?
_Should Few-Shot learning be used for anomaly detection tasks?
How does it compare to well established algorithms such as Patchcore or EfficientAD?_
=== How does disbalancing the Shot number affect performance?
_Does giving the Few-Shot learner more good than bad samples improve the model performance?_
=== How do the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?
_How much does the performance improve by only detecting the presence of an anomaly?
How does it compare to PatchCore and EfficientAD?_
#if inwriting [
=== _Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?_
// I've tried different distance measures $->$ but results are pretty much the same.
]
== Outline
This thesis is structured to provide a comprehensive exploration of Few-Shot Learning in anomaly detection.
@sectionmaterialandmethods introduces the datasets and methodologies used in this research.
The MVTec AD dataset is discussed in detail as the primary source for benchmarking, along with an overview of the Few-Shot Learning paradigm.
The section elaborates on the three selected methods—ResNet50, P>M>F, and CAML—while also touching upon well established anomaly detection algorithms such as Pachcore and EfficientAD.
@sectionimplementation focuses on the practical realization of the methods described in the previous chapter.
It outlines the experimental setup, including the use of Jupyter Notebook for prototyping and testing, and provides a detailed account of how each method was implemented and evaluated.
The experimental outcomes are presented in @sectionexperimentalresults.
This section addresses the research questions posed in @sectionresearchquestions, examining the suitability of Few-Shot Learning for anomaly detection tasks, the impact of class imbalance on model performance, and the comparative effectiveness of the three selected methods.
//Additional experiments explore the differences between Euclidean distance and Cosine similarity when using ResNet as a feature extractor.#todo[Maybe remove this]
Finally, @sectionconclusionandoutlook, summarizes the key findings of this study.
It reflects on the implications of the results for the field of anomaly detection and proposes directions for future research that could address the limitations and enhance the applicability of Few-Shot Learning approaches in this domain.

View File

@ -3,6 +3,7 @@
#import "utils.typ": inwriting, draft, todo, flex-caption, flex-caption-styles
#import "glossary.typ": glossary
#import "@preview/glossarium:0.2.6": make-glossary, print-glossary, gls, glspl
#import "@preview/equate:0.2.1": equate
#show: make-glossary
#show: flex-caption-styles
@ -42,39 +43,41 @@
#show: jku-thesis.with(
thesis-type: "Bachelor",
degree: "Bachelor of Science",
program: "Artifical Intelligence Studies",
supervisor: "Professor Scharinger Josef",
program: "Artifical Intelligence",
supervisor: "Josef Scharinger, a.Univ.-Prof, Dr.",
advisors: (), // singular advisor like this: ("Dr. Felix Pawsworth",) and no supervisor: ""
department: "Department of Image processing",
department: "Institute of Computational Perception",
author: "Lukas Heiligenbrunner",
date: date,
place-of-submission: "Linz",
title: "Few shot learning for anomaly detection",
abstract-en: [//max. 250 words
#lorem(200) ],
This thesis explores the application of Few-Shot Learning (FSL) in anomaly detection, a critical area in industrial and automotive domains requiring robust and efficient algorithms for identifying defects.
Traditional methods, such as PatchCore and EfficientAD, achieve high accuracy but often demand extensive training data and are sensitive to environmental changes, necessitating frequent retraining.
FSL offers a promising alternative by enabling models to generalize effectively from minimal samples, thus reducing training time and adaptation overhead.
The study evaluates three FSL methods—ResNet50, P>M>F, and CAML—using the MVTec AD dataset.
Experiments focus on tasks such as anomaly detection, class imbalance handling, //and comparison of distance metrics.
and anomaly type classification.
Results indicate that while FSL methods trail behind state-of-the-art algorithms in detecting anomalies, they excel in classifying anomaly types, showcasing potential in scenarios requiring detailed defect identification.
Among the tested approaches, P>M>F emerged as the most robust, demonstrating superior accuracy across various settings.
This research underscores the limitations and niche applicability of FSL in anomaly detection, advocating its integration with established algorithms for enhanced performance.
Future work should address the scalability and domain-specific adaptability of FSL techniques to broaden their utility in industrial applications.
],
abstract-de: none,// or specify the abbstract_de in a container []
acknowledgements: [
// TODO
I would like to extend a huge thank you to Dr. Felina Whiskers, my primary advisor, for her pawsitive support and expert guidance. Without her wisdom and occasional catnip breaks, this thesis might have turned into a hairball of confusion.
A special shoutout to Dr. Felix Pawsworth, my co-advisor, for his keen insights and for keeping me from chasing my own tail during this research. Your input was invaluable and much appreciated.
To the cat owners, survey respondents, and interviewees—thank you for sharing your feline escapades. Your stories made this research more entertaining than a laser pointer.
Lastly, to my family and friends, thank you for tolerating the endless cat puns and my obsession with feline behavior. Your patience and encouragement kept me from becoming a full-time cat herder.
To everyone who contributed to this thesis, directly or indirectly, I offer my heartfelt gratitude. You've all made this journey a little less ruff!
],//acknowledgements: none // if you are self-made
acknowledgements: none,//acknowledgements: none // if you are self-made
show-title-in-header: false,
draft: draft,
)
// set equation and heading numbering
#set math.equation(numbering: "(1)")
#show: equate.with(breakable: true, sub-numbering: true)
#set math.equation(numbering: "(1.1)")
#set heading(numbering: "1.1")
// Allow references to a line of the equation.
//#show ref: equate
// Set font size
#show heading.where(level: 3): set text(size: 1.05em)
@ -100,9 +103,9 @@ To everyone who contributed to this thesis, directly or indirectly, I offer my h
// Set citation style
#set cite(style: "iso-690-author-date") // page info visible
// #set cite(style: "iso-690-author-date") // page info visible
//#set cite(style: "iso-690-numeric") // page info visible
//#set cite(style: "springer-basic")// no additional info visible (page number in square brackets)
#set cite(style: "springer-basic")// no additional info visible (page number in square brackets)
//#set cite(style: "alphanumeric")// page info not visible
@ -174,4 +177,4 @@ To everyone who contributed to this thesis, directly or indirectly, I offer my h
#include "conclusionandoutlook.typ"
#set par(leading: 0.7em, first-line-indent: 0em, justify: true)
#bibliography("sources.bib", style: "apa")
#bibliography("sources.bib", style: "ieee")

451
materialandmethods.typ Normal file
View File

@ -0,0 +1,451 @@
#import "@preview/subpar:0.1.1"
#import "utils.typ": todo
#import "@preview/equate:0.2.1": equate
= Material and Methods <sectionmaterialandmethods>
== Material
=== MVTec AD
MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection.
It contains 5354 high-resolution images divided into fifteen different object and texture categories.
Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects.
#figure(
image("rsc/mvtec/dataset_overview_large.png", width: 80%),
caption: [Architecture convolutional neural network. #cite(<datasetsampleimg>)],
) <datasetoverview>
In this bachelor thesis only two categories are used. The categories are "Bottle" and "Cable".
The bottle category contains 3 different defect classes: _broken_large_, _broken_small_ and _contamination_.
#subpar.grid(
figure(image("rsc/mvtec/bottle/broken_large_example.png"), caption: [
Broken large defect
]), <a>,
figure(image("rsc/mvtec/bottle/broken_small_example.png"), caption: [
Broken small defect
]), <b>,
figure(image("rsc/mvtec/bottle/contamination_example.png"), caption: [
Contamination defect
]), <c>,
columns: (1fr, 1fr, 1fr),
caption: [Bottle category different defect classes],
label: <full>,
)
Whereas cable has a lot more defect classes: _bent_wire_, _cable_swap_, _combined_, _cut_inner_insulation_,
_cut_outer_insulation_, _missing_cable_, _missing_wire_, _poke_insulation_.
More defect classes are already an indication that a classification task might be more difficult for the cable category.
#subpar.grid(
figure(image("rsc/mvtec/cable/bent_wire_example.png"), caption: [
Bent wire defect
]), <a>,
figure(image("rsc/mvtec/cable/cable_swap_example.png"), caption: [
Cable swap defect
]), <b>,
figure(image("rsc/mvtec/cable/combined_example.png"), caption: [
Combined defect
]), <c>,
figure(image("rsc/mvtec/cable/cut_inner_insulation_example.png"), caption: [
Cut inner insulation
]), <d>,
figure(image("rsc/mvtec/cable/cut_outer_insulation_example.png"), caption: [
Cut outer insulation
]), <e>,
figure(image("rsc/mvtec/cable/missing_cable_example.png"), caption: [
Mising cable defect
]), <e>,
figure(image("rsc/mvtec/cable/poke_insulation_example.png"), caption: [
Poke insulation defect
]), <f>,
figure(image("rsc/mvtec/cable/missing_wire_example.png"), caption: [
Missing wire defect
]), <g>,
columns: (1fr, 1fr, 1fr, 1fr),
caption: [Cable category different defect classes],
label: <full>,
)
== Methods
=== Few-Shot Learning
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
In contrast to traditional supervised learning, where a huge amount of labeled data is required to generalize well to unseen data,
here we only have 1-10 samples per class (so called shots).
So the model is prone to overfitting to the few training samples and this means they should represent the whole sample distribution as good as possible.~#cite(<parnami2022learningexamplessummaryapproaches>)
Typically a few-shot leaning task consists of a support and query set.
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
A common way to format a few-shot leaning problem is using n-way k-shot notation.
For Example, 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper
A classical example of how such a model might work is a prototypical network.
These models learn a representation of each class in a reduced dimensionality and classify new examples based on proximity to these representations in an embedding space.~@snell2017prototypicalnetworksfewshotlearning
#figure(
image("rsc/prototype_fewshot_v3.png", width: 60%),
caption: [Prototypical network for 3-ways and 5-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
) <prototypefewshot>
The first and easiest method of this bachelor thesis uses a simple ResNet50 to calucalte those embeddings and clusters the shots together by calculating the class center.
This is basically a simple prototypical network.
See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust
=== Generalisation from few samples
An especially hard task is to generalize from such few samples.
In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
This helps the model to learn the underlying patterns and to generalize well to unseen data.
In few-shot learning the model has to generalize from just a few samples.#todo[Source?]#todo[Write more about. eg. class distributions]
=== Softmax
#todo[Maybe remove this section]
The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
$
sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}
$ <softmax>
The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19th century #cite(<Boltzmann>).
=== Cross Entropy Loss
#todo[Maybe remove this section]
Cross Entropy Loss is a well established loss function in machine learning.
@crelformal #cite(<crossentropy>) shows the formal general definition of the Cross Entropy Loss.
And @crelbinary is the special case of the general Cross Entropy Loss for binary classification tasks.
$
H(p,q) &= -sum_(x in cal(X)) p(x) log q(x) #<crelformal>\
H(p,q) &= -(p log(q) + (1-p) log(1-q)) #<crelbinary>\
cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i)) #<crelbatched>
$ <crel>
Equation~$cal(L)(p,q)$ @crelbatched #cite(<handsonaiI>) is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
=== Cosine Similarity
Cosine similarity is a widely used metric for measuring the similarity between two vectors. (@cosinesimilarity).
It computes the cosine of the angle between the vectors, offering a measure of their alignment.
This property makes the cosine similarity particularly effective in scenarios where the
direction of the vector holds more important information than the magnitude.
$
cos(theta) &:= (A dot B) / (||A|| dot ||B||)\
&= (sum_(i=1)^n A_i B_i)/ (sqrt(sum_(i=1)^n A_i^2) dot sqrt(sum_(i=1)^n B_i^2))
$ <cosinesimilarity>
#todo[Source?]
=== Euclidean Distance
The euclidean distance (@euclideannorm) is a simpler method to measure the distance between two points in a vector space.
It just calculates the square root of the sum of the squared differences of the coordinates.
the euclidean distance can also be represented as the L2 norm (euclidean norm) of the difference of the two vectors.
$
cal(d)(A,B) = ||A-B|| := sqrt(sum_(i=1)^n (A_i - B_i)^2)
$ <euclideannorm>
#todo[Source?]
=== Patchcore
// https://arxiv.org/pdf/2106.08265
PatchCore is an advanced method designed for cold-start anomaly detection and localization, primarily focused on industrial image data.
It operates on the principle that an image is anomalous if any of its patches is anomalous.
The method achieves state-of-the-art performance on benchmarks like MVTec AD with high accuracy, low computational cost, and competitive inference times. #cite(<patchcorepaper>)
#todo[Absatz umformulieren und vereinfachen]
The PatchCore framework leverages a pre-trained convolutional neural network (e.g., WideResNet50) to extract mid-level features from image patches.
By focusing on intermediate layers, PatchCore balances the retention of localized information with a reduction in bias associated with high-level features pre-trained on ImageNet.
To enhance robustness to spatial variations, the method aggregates features from local neighborhoods using adaptive pooling, which increases the receptive field without sacrificing spatial resolution. #cite(<patchcorepaper>)
A crucial component of PatchCore is its memory bank, which stores patch-level features derived from the training dataset.
This memory bank represents the nominal distribution of features against which test patches are compared.
To ensure computational efficiency and scalability, PatchCore employs a coreset reduction technique to condense the memory bank by selecting the most representative patch features.
This optimization reduces both storage requirements and inference times while maintaining the integrity of the feature space. #cite(<patchcorepaper>)
#todo[reference to image below]
During inference, PatchCore computes anomaly scores by measuring the distance between patch features from test images and their nearest neighbors in the memory bank.
If any patch exhibits a significant deviation, the corresponding image is flagged as anomalous.
For localization, the anomaly scores of individual patches are spatially aligned and upsampled to generate segmentation maps, providing pixel-level insights into the anomalous regions.~#cite(<patchcorepaper>)
Patchcore reaches a 99.6% AUROC on the MVTec AD dataset when detecting anomalies.
A great advantage of this method is the coreset subsampling reducing the memory bank size significantly.
This lowers computational costs while maintaining detection accuracy.~#cite(<patchcorepaper>)
#figure(
image("rsc/patchcore_overview.png", width: 80%),
caption: [Architecture of Patchcore. #cite(<patchcorepaper>)],
) <patchcoreoverview>
=== EfficientAD
// https://arxiv.org/pdf/2303.14535
EfficientAD is another state of the art method for anomaly detection.
It focuses on maintining performance as well as high computational efficiency.
At its core, EfficientAD uses a lightweight feature extractor, the Patch Description Network (PDN), which processes images in less than a millisecond on modern hardware.
In comparison to Patchcore which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convulutional layers and two pooling layers.
This results in reduced latency while retains the ability to generate patch-level features.~#cite(<efficientADpaper>)
#todo[reference to image below]
The detection of anomalies is achieved through a student-teacher framework.
The teacher network is a PDN and pre-trained on normal (good) images and the student network is trained to predict the teachers output.
An anomalie is identified when the student failes to replicate the teachers output.
This works because of the abscence of anomalies in the training data and the student network has never seen an anomaly while training.
A special loss function helps the student network not to generalize too broadly and inadequatly learn to predict anomalous features.~#cite(<efficientADpaper>)
Additionally to this structural anomaly detection EfficientAD can also address logical anomalies, such as violations in spartial or contextual constraints (eg. object wrong arrangments).
This is done by the integration of an autoencoder trained to replicate the teacher's features.~#cite(<efficientADpaper>)
By comparing the outputs of the autoencdoer and the student logical anomalies are effectively detected.
This is a challenge that Patchcore does not directly address.~#cite(<efficientADpaper>)
#todo[maybe add key advantages such as low computational cost and high performance]
#figure(
image("rsc/efficientad_overview.png", width: 80%),
caption: [Architecture of EfficientAD. #cite(<efficientADpaper>)],
) <efficientadoverview>
=== Jupyter Notebook
A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
The notebook along with the editor provides a environment for fast prototyping and data analysis.
It is widely used in the data science, mathematics and machine learning community.~#cite(<jupyter>)
In the context of this bachelor thesis it was used to test and evaluate the three few-shot learning methods and to compare them.
Furthermore, Matplotlib was used to create the comparison plots.
=== CNN
Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.
A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.
Convolutional layers are a set of learnable kernels (filters).
Each filter performs a convolution operation by sliding a window over every pixel of the image.
On each pixel a dot product creates a feature map.
Convolutional layers capture features like edges, textures or shapes.
Pooling layers sample down the feature maps created by the convolutional layers.
This helps reducing the computational complexity of the overall network and help with overfitting.
Common pooling layers include average- and max pooling.
Finally, after some convolution layers the feature map is flattened and passed to a network of fully connected layers to perform a classification or regression task.
@cnnarchitecture shows a typical binary classification task.~#cite(<cnnintro>)
#figure(
image("rsc/cnn_architecture.png", width: 80%),
caption: [Architecture convolutional neural network. #cite(<cnnarchitectureimg>)],
) <cnnarchitecture>
=== RESNet
Residual neural networks are a special type of neural network architecture.
They are especially good for deep learning and have been used in many state-of-the-art computer vision tasks.
The main idea behind ResNet is the skip connection.
The skip connection is a direct connection from one layer to another layer which is not the next layer.
This helps to avoid the vanishing gradient problem and helps with the training of very deep networks.
ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. #cite(<resnet>)
For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
=== P$>$M$>$F
// https://arxiv.org/pdf/2204.07305
P>P>F (Pre-training > Meta-training > Fine-tuning) is a three-stage pipelined designed for few-shot learning.
It focuses on simplicity but still achieves competitive performance.
The three stages convert a general feature extractor into a task-specific model through fine-tuned optimization.
#cite(<pmfpaper>)
*Pre-training:*
The first stage in @pmfarchitecture initializes the backbone feature extractor.
This can be for instance as ResNet or ViT and is learned by self-supervised techniques.
This backbone is traned on large scale datasets on a general domain such as ImageNet or similar.
This step optimizes for robust feature extractions and builds a foundation model.
There are well established bethods for pretraining which can be used such as DINO (self-supervised consistency), CLIP (Image-text alignment) or BERT (for text data).
#cite(<pmfpaper>)
*Meta-training:*
The second stage in the pipline as in @pmfarchitecture is the meta-training.
Here a prototypical network (ProtoNet) is used to refine the pre-trained backbone.
ProtoNet constructs class centroids for each episode and then performs nearest class centroid classification.
Have a look at @prototypefewshot for a visualisation of its architecture.
The ProtoNet only requires a backbone $f$ to map images to an m-dimensional vector space: $f: cal(X) -> RR^m$.
The probability of a query image $x$ belonging to a class $k$ is given by the $exp$ of the distance of the sample to the class center divided by the sum of all distances:
$
p(y=k|x) = exp(-d(f(x), c_k)) / (sum_(k') exp(-d(f(x), c_k')))#cite(<pmfpaper>)
$
As a distance metric $d$ a cosine similarity is used. See @cosinesimilarity for the formula.
$c_k$, the prototy of a class is defined as $c_k = 1/N_k sum_(i:y_i=k) f(x_i)$ and $N_k$ is just the number of samples of class $k$.
The meta-training process is dataset-agnostic, allowing for flexible adaptation to various few-shot classification scenarios.#cite(<pmfpaper>)
*Fine-tuning:*
If an novel task is drawn from an unseen domain the model may fail to generalize because of a significant fail in the distribution.
To overcome this the model is optionally fine-tuned with the support set on a few gradient steps.
Data augmentation is used to generate a pseudo query set.
With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set.
With the loss of this steps the whole model is fine-tuned to the new domain.~#cite(<pmfpaper>)
#figure(
image("rsc/pmfarchitecture.png", width: 100%),
caption: [Architecture of P>M>F. #cite(<pmfpaper>)],
) <pmfarchitecture>
*Inference:*
During inference the support set is used to calculate the class prototypes.
For a query image the feature extractor extracts its embedding in lower dimensional space and compares it to the pre-computed prototypes.
The query image is then assigned to the class with the closest prototype.#cite(<pmfpaper>)
*Performance:*
P>M>F performs well across several few-shot learning benchmarks.
The combination of pre-training on large dataset and meta-trainng with episodic tasks helps the model to generalize well.
The inclusion of fine-tuning enhances adaptability to unseen domains, ensuring robust and efficient learning.#cite(<pmfpaper>)
*Limitations and Scalability:*
This method has some limitations.
It relies on domains with large external datasets, which require substantial computational computation resources to create pre-trained models.
Fine-tuning is effective but might be slow and not work well on devices with limited ocmputational resources.
Future research could focus on exploring faster and more efficient methods for fine-tuning models.
#cite(<pmfpaper>)
=== CAML <CAML>
// https://arxiv.org/pdf/2310.10971v2
CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning.
It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model.
This is a universal meta-learning approach.
That means no fine-tuning or meta-training is applied for specific domains.~#cite(<caml_paper>)
*Architecture:*
CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture.
This step brings the images into a low dimensional space where similar images are encoded into similar embeddings.
The class labels are encoded with the ELMES class encoder.
Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder.
This embedding is learned during pre-training.
Afterwards each image embedding is concatenated with the corresponding class embedding.
~#cite(<caml_paper>)
#todo[Add more references to the architecture image below]
*ELMES Encoder:*
The ELMES (Equal Length and Maximally Equiangular Set) encoder encodes the class labels to vectors of equal length.
The encoder is a bijective mapping between the labels and set of vectors that are equal length and maximally equiangular.
#todo[Describe what equiangular and bijective means]
Similar to one-hot encoding but with some advantages.
This encoder maximizes the algorithms ability to distinguish between different classes.
~#cite(<caml_paper>)
*Non-causal sequence model:*
The sequence created by the ELMES encoder is then fed into a non-causal sequence model.
This might be for instance a transormer encoder.
This step conditions the input sequence consisting of the query and support set embeddings.
Visual features from query and support set can be compared to each other to determine specific informations such as content or textures.
This can then be used to predict the class of the query image.
From the output of the sequence model the element at the same position as the query is selected.
Afterwards it is passed through a simple MLP network to predict the class of the query image.
~#cite(<caml_paper>)
*Large-Scale Pre-Training:*
CAML is pre-trained on a huge number of images from ImageNet-1k, Fungi, MSCOCO, and WikiArt datasets.
Those datasets span over different domains and help to detect any new visual concept during inference.
Only the non-causal sequence model is trained and the weights of the image encoder and ELMES encoder are kept frozen.
~#cite(<caml_paper>)
*Inference:*
During inference, CAML processes the following:
- Encodes the support set images and labels with the pre-trained feature and class encoders.
- Concatenates these encodings into a sequence alongside the query image embedding.
- Passes the sequence through the non-causal sequence model, enabling dynamic interaction between query and support set representations.
- Extracts the transformed query embedding and classifies it using a Multi-Layer Perceptron (MLP).~#cite(<caml_paper>)
*Performance:*
CAML achieves state-of-the-art performance in universal meta-learning across 11 few-shot classification benchmarks,
including generic object recognition (e.g., MiniImageNet), fine-grained classification (e.g., CUB, Aircraft),
and cross-domain tasks (e.g., Pascal+Paintings).
It outperformed or matched existing models in 14 of 22 evaluation settings.
It performes competitively against P>M>F in 8 benchmarks even though P>M>F was meta-trained on the same domain.
~#cite(<caml_paper>)
CAML does great in generalization and inference efficiency but faces limitations in specialized domains (e.g., ChestX)
and low-resolution tasks (e.g., CIFAR-fs).
Its use of frozen pre-trained feature extractors is key to avoiding overfitting and enabling robust performance.
~#cite(<caml_paper>)
#todo[We should add stuff here why we have a max amount of shots bc. of pretrained model]
#figure(
image("rsc/caml_architecture.png", width: 100%),
caption: [Architecture of CAML. #cite(<caml_paper>)],
) <camlarchitecture>
== Alternative Methods
There are several alternative methods to few-shot learning as well as to anomaly detection which are not used in this bachelor thesis.
Either they performed worse on benchmarks compared to the used methods or they were released after my initial literature research.
=== SgVA-CLIP (Semantic-guided Visual Adapting CLIP)
// https://arxiv.org/pdf/2211.16191v2
// https://arxiv.org/abs/2211.16191v2
SgVA-CLIP (Semantic-guided Visual Adapting CLIP) is a framework that improves few-shot learning by adapting pre-trained vision-language models like CLIP.
It focuses on generating better visual features for specific tasks while still using the general knowledge from the pre-trained model.
Instead of only aligning images and text, SgVA-CLIP includes a special visual adapting layer that makes the visual features more discriminative for the given task.
This process is supported by knowledge distillation, where detailed information from the pre-trained model guides the learning of the new visual features.
Additionally, the model uses contrastive losses to further refine both the visual and textual representations.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
One advantage of SgVA-CLIP is that it can work well with very few labeled samples, making it suitable for applications like anomaly detection.
The use of pre-trained knowledge helps reduce the need for large datasets.
However, a disadvantage is that it depends heavily on the quality and capabilities of the pre-trained model.
If the pre-trained model lacks relevant information for the task, SgVA-CLIP might struggle to adapt.
This might be a no-go for anomaly detection tasks because the images in such tasks are often very task-specific and not covered by general pre-trained models.
Also, fine-tuning the model can require considerable computational resources, which might be a limitation in some cases.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
=== TRIDENT (Transductive Decoupled Variational Inference for Few-Shot Classification) <TRIDENT>
// https://arxiv.org/pdf/2208.10559v1
// https://arxiv.org/abs/2208.10559v1
TRIDENT, a variational infernce network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification.
By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>)
To further improve the discriminative performance of the model, it incorporates a transductive feature extraction module named AttFEX (Attention-based Feature Extraction).
This feature extractor dynamically aligns features from both the support and the query set, promoting task-specific embeddings.~#cite(<singh2022transductivedecoupledvariationalinference>)
This model is specifically designed for few-shot classification tasks but might also work well for anomaly detection.
Its ability to isolate critical features while droping irellevant context aligns with requirements needed for anomaly detection.
=== SOT (Self-Optimal-Transport Feature Transform) <SOT>
// https://arxiv.org/pdf/2204.03065v1
// https://arxiv.org/abs/2204.03065v1
The Self-Optimal-Transport (SOT) Feature Transform is designed to enhance feature sets for tsks like matching, grouping or classification by re-embedding feature representations.
This transform processes features as a set instead of using them individually.
This creates context-aware representations.
SOT can catch direct as well as indirect similarities between features which makes it suitable for tasks like few-shot learning or clustering.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
SOT uses a transport plan matrix derived from optimal transport theory to redefine feature relations.
This includes calculating pairwaise similarities (e.g. cosine similarities) between features and solving a min-cost max-flow problem to find an optimal match between features.
This results in an doubly stochastic matrix where each row represents the re-embedding of the corresponding feature in context with others.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
The transform features parameterless-ness, which makes it easy to integrate into existing machine-learning pipelines.
It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT.
SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
The improvements of SOT over traditional feature transforms dpeend on the used backbone network and the task.
But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
// anomaly detect
=== GLASS (Global and Local Anomaly co-Synthesis Strategy)
// https://arxiv.org/pdf/2407.09359v1
// https://arxiv.org/abs/2407.09359v1
GLASS (Global and Local Anomaly co-Synthesis Strategy) is a anomaly detection method for industrial applications.
It is a unified network which uses two different strategies to detect anomalies which are then combined.
The first one is Global Anomaly Synthesis (GAS), it operates on the feature level.
It uses a gaussian noise, guided by gradient ascent and constrained by truncated projection to generate anomalies close to the distribution for the normal features.
This helps the detection of weak defects.
The second strategy is Local Anomaly Synthesis (LAS), it operates on the image level.
This strategy overlays textures onto normal images using masks derived from noise patterns.
LAS creates strong anomalies which are further away from the normal sample distribution.
This adds diversity to the synthesized anomalies.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
GLASS combines GAS and LAS to improve anomaly detection and localization by synthesizing anomalies near and far from the normal distribution.
Experiments show that GLASS is very effective and outperforms some state-of-the-art methods on the MVTec AD dataset such as PatchCore in some cases.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
//=== HETMM (Hard-normal Example-aware Template Mutual Matching)
// https://arxiv.org/pdf/2303.16191v5
// https://arxiv.org/abs/2303.16191v5

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,895 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import numpy as np\n",
"import time\n",
"import random\n",
"import torch\n",
"import torchvision.transforms as transforms\n",
"#import gradio as gr\n",
"import matplotlib.pyplot as plt\n",
"\n",
"from models import get_model\n",
"from dotmap import DotMap\n",
"from PIL import Image"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Pretrained weights found at dino_vitbase16_pretrain/dino_vitbase16_pretrain.pth\n"
]
}
],
"source": [
"args = DotMap()\n",
"args.deploy = 'finetune'\n",
"args.arch = 'dino_base_patch16'\n",
"args.no_pretrain = True\n",
"small = \"https://huggingface.co/hushell/pmf_metadataset_dino/resolve/main/md_full_128x128_dinosmall_fp16_lr5e-5/best.pth?download=true\"\n",
"full = 'https://huggingface.co/hushell/pmf_metadataset_dino/resolve/main/md_full_128x128_dinobase_fp16_lr5e-5/best.pth?download=true'\n",
"args.resume = full\n",
"args.api_key = 'AIzaSyAFkOGnXhy-2ZB0imDvNNqf2rHb98vR_qY'\n",
"args.cx = '06d75168141bc47f1'\n",
"\n",
"args.ada_steps = 100\n",
"#args.ada_lr= 0.0001\n",
"#args.aug_prob = .95\n",
"args.ada_lr= 0.0001\n",
"args.aug_prob = .9\n",
"args.aug_types = [\"color\", \"translation\"]\n",
"\n",
"device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"model = get_model(args)\n",
"model.to(device)\n",
"checkpoint = torch.hub.load_state_dict_from_url(args.resume, map_location='cpu')\n",
"model.load_state_dict(checkpoint['model'], strict=True)\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# image transforms\n",
"def test_transform():\n",
" def _convert_image_to_rgb(im):\n",
" return im.convert('RGB')\n",
"\n",
" return transforms.Compose([\n",
" transforms.Resize(224),\n",
" #transforms.CenterCrop(224),\n",
" _convert_image_to_rgb,\n",
" transforms.ToTensor(),\n",
" transforms.Normalize(mean=[0.485, 0.456, 0.406],\n",
" std=[0.229, 0.224, 0.225]),\n",
" ])\n",
"\n",
"preprocess = test_transform()"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"def build_sup_set(shotnr, type_name, binary, good_sample_nr):\n",
" classes = next(os.walk(f'data_custom/{type_name}/test'))[1]\n",
" classes.remove(\"good\")\n",
"\n",
" supp_x = []\n",
" supp_y = []\n",
" mapping = {\n",
" \"good\" : 0\n",
" }\n",
"\n",
" # add good manually\n",
" x_good = [Image.open(f\"data_custom/{type_name}/train/good/{x:03d}.png\") for x in range(0, good_sample_nr)]\n",
" supp_x.extend([preprocess(x) for x in x_good]) # (3, H, W))\n",
" supp_y.extend([0] * good_sample_nr)\n",
" \n",
" for i,c in enumerate(classes):\n",
" #i-=1\n",
" x_im = [Image.open(f\"data_custom/{type_name}/test/{c}/{x:03d}.png\") for x in range(0, shotnr)]\n",
" supp_x.extend([preprocess(x) for x in x_im]) # (3, H, W))\n",
" if binary:\n",
" supp_y.extend([1] * shotnr)\n",
" mapping[\"anomaly\"] = 1\n",
" else:\n",
" supp_y.extend([i+1] * shotnr)\n",
" mapping[c] = i+1\n",
" \n",
" supp_x = torch.stack(supp_x, dim=0).unsqueeze(0).to(device) # (1, n_supp*n_labels, 3, H, W)\n",
" supp_y = torch.tensor(supp_y).long().unsqueeze(0).to(device) # (1, n_supp*n_labels)\n",
" return supp_x, supp_y, mapping\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"def build_test_set(shotnr, keyy, type):\n",
" _, _, files = next(os.walk(f\"data_custom/cable/test/{type}/\"))\n",
" file_count = len(files)\n",
" print(file_count)\n",
"\n",
" queries = [preprocess(Image.open(f\"data_custom/cable/test/{type}/{i:03d}.png\")).unsqueeze(0).unsqueeze(0).to(device) for i in range(shotnr,file_count)]\n",
" labels = [keyy for x in range(shotnr,file_count)]\n",
" return queries, labels\n",
"\n",
"def test(type, keyy, shotnr, folder):\n",
" predictions = []\n",
" _, _, files = next(os.walk(f\"data_custom/{folder}/test/{type}/\"))\n",
" file_count = len(files)\n",
" print(file_count)\n",
"\n",
" queries = [preprocess(Image.open(f\"data_custom/{folder}/test/{type}/{i:03d}.png\")).unsqueeze(0).unsqueeze(0).to(device) for i in range(shotnr,file_count)]\n",
" queries = torch.cat(queries)\n",
" with torch.cuda.amp.autocast(True):\n",
" output = model(supp_x, supp_y, queries) # (1, 1, n_labels)\n",
"\n",
" probs = output.softmax(dim=-1).detach().cpu().numpy()\n",
" predictions = np.argmax(probs, axis=2)\n",
" print()\n",
" return np.mean([x == keyy for x in predictions])\n",
" pass\n",
" \n",
"#def test2(folder):\n",
"# accs = []\n",
"# queries = []\n",
"# labels = []\n",
"# for t in next(os.walk(f'data_custom/cable/test'))[1]:\n",
"# q, l = build_test_set(shots, types.get(t, 1), t)\n",
"# queries+=q\n",
"# labels+=l\n",
"#\n",
"# queries = torch.cat(queries)\n",
"# labels = np.array(labels)\n",
"#\n",
"# with torch.cuda.amp.autocast(True):\n",
"# output = model(supp_x, supp_y, queries) # (1, 1, n_labels)\n",
"#\n",
"# probs = output.softmax(dim=-1).detach().cpu().numpy()\n",
"# predictions = np.argmax(probs, axis=2)\n",
"# print()\n",
"# return np.mean([predictions == labels])\n",
"# pass\n",
"\n",
"#print(f\"overall accuracy: {test(\"cable\")}\")\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"14\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp45, nQry9: loss = 0.1475423127412796: 100%|██| 100/100 [00:29<00:00, 3.40it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cut_inner_insulation = 1.0\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp45, nQry5: loss = 0.20609889924526215: 100%|█| 100/100 [00:29<00:00, 3.37it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for poke_insulation = 1.0\n",
"12\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp45, nQry7: loss = 0.12025140225887299: 100%|█| 100/100 [00:29<00:00, 3.34it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cable_swap = 0.8571428571428571\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp45, nQry5: loss = 0.2130972295999527: 100%|██| 100/100 [00:30<00:00, 3.30it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cut_outer_insulation = 1.0\n",
"58\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp45, nQry53: loss = 0.13926956057548523: 100%|█| 100/100 [00:30<00:00, 3.30it/s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for good = 0.16981132075471697\n",
"12\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp45, nQry7: loss = 0.16337624192237854: 100%|█| 100/100 [00:30<00:00, 3.28it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for missing_cable = 1.0\n",
"11\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp45, nQry6: loss = 0.16593313217163086: 100%|█| 100/100 [00:30<00:00, 3.27it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for combined = 1.0\n",
"13\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp45, nQry8: loss = 0.16560573875904083: 100%|█| 100/100 [00:30<00:00, 3.27it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for bent_wire = 1.0\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp45, nQry5: loss = 0.18611018359661102: 100%|█| 100/100 [00:30<00:00, 3.28it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for missing_wire = 0.8\n",
"overall accuracy: 0.8696615753219527\n",
"14\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp50, nQry9: loss = 0.3357824385166168: 100%|██| 100/100 [00:33<00:00, 2.96it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cut_inner_insulation = 0.7777777777777778\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp50, nQry5: loss = 0.3290153741836548: 100%|██| 100/100 [00:33<00:00, 2.96it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for poke_insulation = 0.6\n",
"12\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp50, nQry7: loss = 0.22177687287330627: 100%|█| 100/100 [00:33<00:00, 2.96it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cable_swap = 0.8571428571428571\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp50, nQry5: loss = 0.299775630235672: 100%|███| 100/100 [00:33<00:00, 2.96it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cut_outer_insulation = 1.0\n",
"58\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp50, nQry53: loss = 0.31954386830329895: 100%|█| 100/100 [00:33<00:00, 2.98it/s\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for good = 0.32075471698113206\n",
"12\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp50, nQry7: loss = 0.336273193359375: 100%|███| 100/100 [00:33<00:00, 2.98it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for missing_cable = 0.8571428571428571\n",
"11\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp50, nQry6: loss = 0.3643767237663269: 100%|██| 100/100 [00:33<00:00, 2.98it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for combined = 1.0\n",
"13\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp50, nQry8: loss = 0.3085792660713196: 100%|██| 100/100 [00:33<00:00, 2.98it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for bent_wire = 1.0\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp50, nQry5: loss = 0.34715649485588074: 100%|█| 100/100 [00:33<00:00, 2.98it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for missing_wire = 0.8\n",
"overall accuracy: 0.8014242454494026\n",
"14\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp55, nQry9: loss = 0.375447154045105: 100%|███| 100/100 [00:36<00:00, 2.76it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cut_inner_insulation = 0.6666666666666666\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp55, nQry5: loss = 0.42370423674583435: 100%|█| 100/100 [00:36<00:00, 2.75it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for poke_insulation = 1.0\n",
"12\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp55, nQry7: loss = 0.3982161581516266: 100%|██| 100/100 [00:36<00:00, 2.74it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cable_swap = 0.8571428571428571\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp55, nQry5: loss = 0.3903641104698181: 100%|██| 100/100 [00:36<00:00, 2.75it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cut_outer_insulation = 1.0\n",
"58\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp55, nQry53: loss = 0.4019339382648468: 100%|█| 100/100 [00:36<00:00, 2.75it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for good = 0.41509433962264153\n",
"12\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp55, nQry7: loss = 0.4283098876476288: 100%|██| 100/100 [00:36<00:00, 2.75it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for missing_cable = 0.7142857142857143\n",
"11\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp55, nQry6: loss = 0.3741377890110016: 100%|██| 100/100 [00:36<00:00, 2.74it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for combined = 0.8333333333333334\n",
"13\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp55, nQry8: loss = 0.3858358860015869: 100%|██| 100/100 [00:36<00:00, 2.75it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for bent_wire = 1.0\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp55, nQry5: loss = 0.3570959270000458: 100%|██| 100/100 [00:36<00:00, 2.74it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for missing_wire = 0.8\n",
"overall accuracy: 0.8096136567834681\n",
"14\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp70, nQry9: loss = 0.5021733045578003: 100%|██| 100/100 [00:45<00:00, 2.21it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cut_inner_insulation = 0.5555555555555556\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp70, nQry5: loss = 0.5203520059585571: 100%|██| 100/100 [00:45<00:00, 2.20it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for poke_insulation = 0.4\n",
"12\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp70, nQry7: loss = 0.524366021156311: 100%|███| 100/100 [00:45<00:00, 2.21it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cable_swap = 0.42857142857142855\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp70, nQry5: loss = 0.5256413221359253: 100%|██| 100/100 [00:45<00:00, 2.21it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for cut_outer_insulation = 1.0\n",
"58\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp70, nQry53: loss = 0.5186663866043091: 100%|█| 100/100 [00:45<00:00, 2.21it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for good = 0.7358490566037735\n",
"12\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp70, nQry7: loss = 0.5123675465583801: 100%|██| 100/100 [00:45<00:00, 2.21it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for missing_cable = 0.7142857142857143\n",
"11\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp70, nQry6: loss = 0.5076506733894348: 100%|██| 100/100 [00:45<00:00, 2.21it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for combined = 0.8333333333333334\n",
"13\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp70, nQry8: loss = 0.490247517824173: 100%|███| 100/100 [00:45<00:00, 2.21it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for bent_wire = 0.875\n",
"10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"lr0.0001, nSupp70, nQry5: loss = 0.3723257780075073: 100%|██| 100/100 [00:45<00:00, 2.21it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"accuracy for missing_wire = 0.4\n",
"overall accuracy: 0.6602883431499785\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"#bottle_accs = []\n",
"cable_accs = []\n",
"\n",
"for nr in [5, 10, 15, 30]:\n",
" folder = \"cable\"\n",
" shot = 5\n",
" supp_x, supp_y, types = build_sup_set(shot, folder, True, nr)\n",
" accs = []\n",
" for t in next(os.walk(f'data_custom/{folder}/test'))[1]:\n",
" #if t == \"good\":\n",
" # continue\n",
" accuracy = test(t, types.get(t, 1), shot, folder)\n",
" print(f\"accuracy for {t} = {accuracy}\")\n",
" accs.append(accuracy)\n",
" print(f\"overall accuracy: {np.mean(accs)}\")\n",
" cable_accs.append(np.mean(accs))\n"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.57380952 0.76705653 0.84191176]\n"
]
}
],
"source": [
"print(np.array(bottle_accs))"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.86966158 0.80142425 0.80961366 0.66028834]\n"
]
}
],
"source": [
"print(np.array(cable_accs))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"P>M>F:\n",
"Resulsts:\n",
"\n",
"bottle:\n",
"jeweils 1,3,5 shots normal\n",
"[0.67910401 0.71710526 0.78860294]\n",
"\n",
"inbalanced - mehr good shots 5,10,15,30 -> alle anderen nur 5\n",
"[0.78768382 0.78860294 0.75827206 0.74356618]\n",
"\n",
"2 ways nur detektieren ob fehlerhaft oder nicht 1,3,5 shots\n",
"[0.86422306 0.93201754 0.93933824]\n",
"\n",
"inbalance 2 way 5,10,15,30 -> rest 5\n",
"[0.92371324 0.87867647 0.86397059 0.87775735]\n",
"\n",
"nur fehlerklasse erkennen 1,3,5\n",
"[0.57380952 0.76705653 0.84191176]\n",
"\n",
"\n",
"cable:\n",
"jeweils 1,3,5 shots normal\n",
"[0.25199021 0.44388328 0.46975059]\n",
"\n",
"inbalanced - mehr good shots 5,10,15,30 -> alle anderen nur 5\n",
"[0.50425859 0.48023277 0.43118282 0.41842534]\n",
"\n",
"2 ways nur detektieren ob fehlerhaft oder nicht 1,3,5 shots\n",
"[0.79263485 0.8707712 0.86756514]\n",
"\n",
"inbalance 2 way 5,10,15,30 -> rest 5\n",
"[0.86966158 0.80142425 0.80961366 0.66028834]\n",
"\n",
"nur fehlerklasse erkennen 1,3,5\n",
"[0.24383256 0.43800505 0.51304563]\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

BIN
notebooks/CAML-bottle.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

BIN
notebooks/CAML-cable.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

BIN
notebooks/P>M>F-bottle.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

BIN
notebooks/P>M>F-cable.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

View File

@ -282,7 +282,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
"version": "3.13.1"
}
},
"nbformat": 4,

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

BIN
notebooks/normal-bottle.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

BIN
notebooks/normal-cable.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

File diff suppressed because one or more lines are too long

View File

@ -887,7 +887,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
"version": "3.13.1"
}
},
"nbformat": 4,

View File

@ -2,14 +2,18 @@
"cells": [
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"imports imported\n"
"ename": "ModuleNotFoundError",
"evalue": "No module named 'torchvision'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[1], line 8\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mmatplotlib\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mpyplot\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mplt\u001b[39;00m\n\u001b[1;32m 7\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtorch\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m optim, nn\n\u001b[0;32m----> 8\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mtorchvision\u001b[39;00m\n\u001b[1;32m 9\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mtorchvision\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m datasets, models, transforms\n\u001b[1;32m 10\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01malbumentations\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mA\u001b[39;00m\n",
"\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'torchvision'"
]
}
],
@ -238,9 +242,9 @@
"\n",
"print(resnetshotnr0)\n",
"# Step 2: Modify the model to output features from the layer before the fully connected layer\n",
"class ResNetshotnr0Embeddings(nn.Module):\n",
"class ResNet50Embeddings(nn.Module):\n",
" def __init__(self, original_model, layernr):\n",
" super(ResNetshotnr0Embeddings, self).__init__()\n",
" super(ResNet50Embeddings, self).__init__()\n",
" #print(list(original_model.children())[4 + layernr])\n",
" #print(nn.Sequential(*list(original_model.children())[:4 + shotnr]))\n",
" self.features = nn.Sequential(*list(original_model.children())[:4+layernr])\n",
@ -252,7 +256,7 @@
" return x\n",
"\n",
"# Instantiate the modified model\n",
"model = ResNetshotnr0Embeddings(resnetshotnr0, shotnr) # 3 = layer before fully connected one\n",
"model = ResNet50Embeddings(resnetshotnr0, shotnr) # 3 = layer before fully connected one\n",
"model.eval() # Set the model to evaluation mode\n",
"print()\n"
]
@ -487,9 +491,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
"version": "3.13.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

BIN
rsc/caml/CAML-bottle.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

BIN
rsc/caml/CAML-cable.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

BIN
rsc/caml_architecture.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 218 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 205 KiB

BIN
rsc/faultclasses-bottle.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

BIN
rsc/faultclasses-cable.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

BIN
rsc/inbalanced-bottle.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

BIN
rsc/inbalanced-cable.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 157 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 187 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 331 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 369 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 348 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 357 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 341 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 311 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 334 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 335 KiB

View File

Before

Width:  |  Height:  |  Size: 1.4 MiB

After

Width:  |  Height:  |  Size: 1.4 MiB

BIN
rsc/normal-bottle.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

BIN
rsc/normal-cable.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

BIN
rsc/patchcore_overview.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 418 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

BIN
rsc/pmf/P>M>F-bottle.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 38 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

BIN
rsc/pmf/P>M>F-cable.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

BIN
rsc/pmfarchitecture.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 117 KiB

View File

Before

Width:  |  Height:  |  Size: 66 KiB

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

View File

@ -108,3 +108,92 @@
primaryClass={cs.LG},
url={https://arxiv.org/abs/1703.05175},
}
@misc{caml_paper,
title={Context-Aware Meta-Learning},
author={Christopher Fifty and Dennis Duan and Ronald G. Junkins and Ehsan Amid and Jure Leskovec and Christopher Re and Sebastian Thrun},
year={2024},
eprint={2310.10971},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2310.10971},
}
@misc{handsonaiI,
author = {Andreas Schörgenhumer, Bernhard Schäfl, Michael Widrich},
title = {Lecture notes in Hands On AI I, Unit 4 \& 5},
month = {October},
year = {2021},
publisher={Johannes Kepler Universität Linz}
}
@misc{pmfpaper,
title={Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference},
author={Shell Xu Hu and Da Li and Jan Stühmer and Minyoung Kim and Timothy M. Hospedales},
year={2022},
eprint={2204.07305},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2204.07305},
}
@misc{peng2023sgvaclipsemanticguidedvisualadapting,
title={SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification},
author={Fang Peng and Xiaoshan Yang and Linhui Xiao and Yaowei Wang and Changsheng Xu},
year={2023},
eprint={2211.16191},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2211.16191},
}
@misc{singh2022transductivedecoupledvariationalinference,
title={Transductive Decoupled Variational Inference for Few-Shot Classification},
author={Anuj Singh and Hadi Jamali-Rad},
year={2022},
eprint={2208.10559},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2208.10559},
}
@misc{chen2024unifiedanomalysynthesisstrategy,
title={A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization},
author={Qiyu Chen and Huiyuan Luo and Chengkan Lv and Zhengtao Zhang},
year={2024},
eprint={2407.09359},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.09359},
}
@misc{shalam2022selfoptimaltransportfeaturetransform,
title={The Self-Optimal-Transport Feature Transform},
author={Daniel Shalam and Simon Korman},
year={2022},
eprint={2204.03065},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2204.03065},
}
@misc{parnami2022learningexamplessummaryapproaches,
title={Learning from Few Examples: A Summary of Approaches to Few-Shot Learning},
author={Archit Parnami and Minwoo Lee},
year={2022},
eprint={2203.04291},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2203.04291},
}
@misc{chowdhury2021fewshotimageclassificationjust,
title={Few-shot Image Classification: Just Use a Library of Pre-trained Feature Extractors and a Simple Classifier},
author={Arkabandhu Chowdhury and Mingchao Jiang and Swarat Chaudhuri and Chris Jermaine},
year={2021},
eprint={2101.00562},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2101.00562},
}

View File

@ -1,5 +0,0 @@
\section{Conclusion and Outlook}\label{sec:conclusion-and-outlook}
\subsection{Conclusion}\label{subsec:conclusion}
\subsection{Outlook}\label{subsec:outlook}

View File

@ -1,16 +0,0 @@
\section{Experimental Results}\label{sec:experimental-results}
\subsubsection{Is Few-Shot learning a suitable fit for anomaly detection?}
Should Few-Shot learning be used for anomaly detection tasks?
How does it compare to well established algorithms such as Patchcore or EfficientAD?
\subsubsection{How does disbalancing the Shot number affect performance?}
Does giving the Few-Shot learner more good than bad samples improve the model performance?
\subsubsection{How does the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?}
How much does the performance improve if only detecting an anomaly or not?
How does it compare to PatchCore and EfficientAD?
\subsubsection{Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?}
I've tried different distance measures $\rightarrow$ but results are pretty much the same.

View File

@ -1,17 +0,0 @@
\section{Implementation}\label{sec:implementation}
\subsection{Experiment Setup}\label{subsec:experiment-setup}
% todo
todo setup of experiments, which classes used, nr of samples
kinds of experiments which lead to graphs
\subsection{Jupyter}\label{subsec:jupyter}
To get accurate performance measures the active-learning process was implemented in a Jupyter notebook first.
This helps to choose which of the methods performs the best and which one to use in the final Dagster pipeline.
A straight forward machine-learning pipeline was implemented with the help of Pytorch and RESNet-18.
Moreover, the Dataset was manually imported with the help of a custom torch dataloader and preprocessed with random augmentations.
After each loop iteration the Area Under the Curve (AUC) was calculated over the validation set to get a performance measure.
All those AUC were visualized in a line plot, see section~\ref{sec:experimental-results} for the results.

View File

@ -1,31 +0,0 @@
\section{Introduction}\label{sec:introduction}
\subsection{Motivation}\label{subsec:motivation}
Anomaly detection has especially in the industrial and automotive field essential importance.
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
Machine learning helped the field to advance a lot in the past.
PatchCore and EfficientAD are state of the art algorithms trained only on good data and then detect anomalies within unseen (but similar) data.
One of their problems is the need of lots of training data and time to train.
Few-Shot learning might be a suitable alternative with essentially lowered train time.
In this thesis the performance of 3 Few-Shot learning algorithms will be compared in the field of anomaly detection.
Moreover, few-shot learning might be able not only to detect anomalies but also to detect the anomaly class.
\subsection{Research Questions}\label{subsec:research-questions}
\subsubsection{Is Few-Shot learning a suitable fit for anomaly detection?}
Should Few-Shot learning be used for anomaly detection tasks?
How does it compare to well established algorithms such as Patchcore or EfficientAD?
\subsubsection{How does disbalancing the Shot number affect performance?}
Does giving the Few-Shot learner more good than bad samples improve the model performance?
\subsubsection{How does the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?}
How much does the performance improve if only detecting an anomaly or not?
How does it compare to PatchCore and EfficientAD?
\subsubsection{Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?}
I've tried different distance measures $\rightarrow$ but results are pretty much the same.
\subsection{Outline}\label{subsec:outline}
todo

File diff suppressed because it is too large Load Diff

View File

@ -1,160 +0,0 @@
\def\ieee{0}
\if\ieee1
\documentclass[sigconf]{acmart}
\else
\documentclass{llncs}
\fi
\usepackage{amsmath}
\usepackage{mathtools}
\usepackage{hyperref}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{subfig}
\usepackage[inline]{enumitem}
\usepackage{color}
\usepackage{tikz}
\usetikzlibrary{shapes.geometric, arrows}
\tikzstyle{startstop} = [rectangle, rounded corners, minimum width=3cm, minimum height=1cm,text centered, draw=black, fill=red!30]
\tikzstyle{io} = [rectangle, rounded corners,minimum width=3cm, minimum height=1cm, text centered, draw=black, fill=blue!30]
\tikzstyle{process} = [rectangle, minimum width=3cm, minimum height=1cm, text centered, draw=black, fill=orange!30]
\tikzstyle{decision} = [diamond, minimum width=3cm, minimum height=1cm, text centered, draw=black, fill=green!30]
\tikzstyle{arrow} = [thick,->,>=stealth]
\definecolor{codegreen}{rgb}{0,0.6,0}
\definecolor{codegray}{rgb}{0.5,0.5,0.5}
\definecolor{codepurple}{rgb}{0.58,0,0.82}
\definecolor{backcolour}{rgb}{0.95,0.95,0.92}
\lstdefinestyle{mystyle}{
backgroundcolor=\color{backcolour},
commentstyle=\color{codegreen},
keywordstyle=\color{magenta},
numberstyle=\tiny\color{codegray},
stringstyle=\color{codepurple},
basicstyle=\ttfamily\scriptsize,
breakatwhitespace=false,
breaklines=true,
captionpos=b,
keepspaces=true,
numbers=left,
numbersep=5pt,
showspaces=false,
showstringspaces=false,
showtabs=false,
tabsize=2
}
\lstset{style=mystyle}
\newcommand{\pmf}{$P{>}M{>}F$}
%\lstset{basicstyle=\ttfamily, keywordstyle=\bfseries}
\if\ieee1
\settopmatter{printacmref=false} % Removes citation information below abstract
\renewcommand\footnotetextcopyrightpermission[1]{} % removes footnote with conference information in first column
\pagestyle{plain} % removes running headers
\fi
%%
%% \BibTeX command to typeset BibTeX logo in the docs
\if\ieee1
\AtBeginDocument{%
\providecommand\BibTeX{{%
\normalfont B\kern-0.5em{\scshape i\kern-0.25em b}\kern-0.8em\TeX}}}
\acmConference{Minimize labeling effort of Binary classification Tasks with Active learning}{2023}{Linz}
\fi
% Document
\begin{document}
%%
%% The "title" command has an optional parameter,
%% allowing the author to define a "short title" to be used in page headers.
\title{Few shot learning for anomaly detection\\ Bachelor Thesis for AI}
%%
%% The "author" command and its associated commands are used to define
%% the authors and their affiliations.
%% Of note is the shared affiliation of the first two authors, and the
%% "authornote" and "authornotemark" commands
%% used to denote shared contribution to the research.
\author{Lukas Heiligenbrunner}
\if\ieee1
\email{k12104785@students.jku.at}
\affiliation{%
\institution{Johannes Kepler University Linz}
\city{Linz}
\state{Upperaustria}
\country{Austria}
\postcode{4020}
}
\else
\institute{Johannes Kepler University Linz}
\fi
%%
%% By default, the full list of authors will be used in the page
%% headers. Often, this list is too long, and will overlap
%% other information printed in the page headers. This command allows
%% the author to define a more concise list
%% of authors' names for this purpose.
% \renewcommand{\shortauthors}{Lukas Heilgenbrunner}
%%
%% The abstract is a short summary of the work to be presented in the
%% article.
\if\ieee0
\maketitle
\fi
\begin{abstract}
Todo abstract!!
\end{abstract}
%%
%% Keywords. The author(s) should pick words that accurately describe
%% the work being presented. Separate the keywords with commas.
\if\ieee1
\keywords{neural networks, ResNET, pseudo-labeling, active-learning}
\fi
%\received{20 February 2007}
%\received[revised]{12 March 2009}
%\received[accepted]{5 June 2009}
%%
%% This command processes the author and affiliation and title
%% information and builds the first part of the formatted document.
\if\ieee1
\maketitle
\fi
\input{introduction}
\input{materialandmethods}
\input{implementation}
\input{experimentalresults}
\input{conclusionandoutlook}
%% The next two lines define the bibliography style to be used, and
%% the bibliography file.
\bibliographystyle{ACM-Reference-Format}
\bibliography{../src/sources}
%%
%% If your work has an appendix, this is the place to put it.
\appendix
% appendix
\end{document}
\endinput

View File

@ -1,122 +0,0 @@
\section{Material and Methods}\label{sec:material-and-methods}
\subsection{Material}\label{subsec:material}
\subsubsection{MVTec AD}\label{subsubsec:mvtecad}
MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection.
It contains over 5000 high-resolution images divided into fifteen different object and texture categories.
Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects.
% todo source for https://www.mvtec.com/company/research/datasets/mvtec-ad
% todo example image
%\begin{figure}
% \centering
% \includegraphics[width=\linewidth/2]{../rsc/muffin_chiauaua_poster}
% \caption{Sample images from dataset. \cite{muffinsvschiuahuakaggle_poster}}
% \label{fig:roc-example}
%\end{figure}
\subsection{Methods}\label{subsec:methods}
\subsubsection{Few-Shot Learning}
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
So the model is prone to overfitting to the few training samples.
Typically a few-shot leaning task consists of a support and query set.
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
A common way to format a few-shot leaning problem is using n-way k-shot notation.
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.
A classical example of how such a model might work is a prototypical network.
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
See %todo link to this section
% todo proper source
\subsubsection{Generalisation from few samples}
\subsubsection{Patchcore}
%todo also show values how they perform on MVTec AD
\subsubsection{EfficientAD}
todo stuff~\cite{patchcorepaper}
% https://arxiv.org/pdf/2106.08265
todo stuff\cite{efficientADpaper}
% https://arxiv.org/pdf/2303.14535
\subsubsection{Jupyter Notebook}\label{subsubsec:jupyternb}
A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
The notebook along with the editor provides a environment for fast prototyping and data analysis.
It is widely used in the data science, mathematics and machine learning community.
In the context of this practical work it can be used to test and evaluate the active learning loop before implementing it in a Dagster pipeline. \cite{jupyter}
\subsubsection{CNN}
Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.
A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.
Convolutional layers are a set of learnable kernels (filters).
Each filter performs a convolution operation by sliding a window over every pixel of the image.
On each pixel a dot product creates a feature map.
Convolutional layers capture features like edges, textures or shapes.
Pooling layers sample down the feature maps created by the convolutional layers.
This helps reducing the computational complexity of the overall network and help with overfitting.
Common pooling layers include average- and max pooling.
Finally, after some convolution layers the feature map is flattened and passed to a network of fully connected layers to perform a classification or regression task.
Figure~\ref{fig:cnn-architecture} shows a typical binary classification task.
\cite{cnnintro}
\begin{figure}
\centering
\includegraphics[width=\linewidth]{../rsc/cnn_architecture}
\caption{Architecture convolutional neural network. \cite{cnnarchitectureimg}}
\label{fig:cnn-architecture}
\end{figure}
\subsubsection{RESNet}
Residual neural networks are a special type of neural network architecture.
They are especially good for deep learning and have been used in many state-of-the-art computer vision tasks.
The main idea behind ResNet is the skip connection.
The skip connection is a direct connection from one layer to another layer which is not the next layer.
This helps to avoid the vanishing gradient problem and helps with the training of very deep networks.
ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. \cite{resnet}
Since the dataset is relatively small and the two class classification task is relatively easy (for such a large model) the ResNet-18 architecture is used in this practical work.
\subsubsection{CAML}
Todo
\subsubsection{P$>$M$>$F}
Todo
\subsubsection{Softmax}
The Softmax function~\eqref{eq:softmax}\cite{liang2017soft} converts $n$ numbers of a vector into a probability distribution.
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
\begin{equation}\label{eq:softmax}
\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^K e^{z_k}} \; for j\coloneqq\{1,\dots,K\}
\end{equation}
The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19$^{\textrm{th}}$ century~\cite{Boltzmann}.
\subsubsection{Cross Entropy Loss}
Cross Entropy Loss is a well established loss function in machine learning.
Equation~\eqref{eq:crelformal}\cite{crossentropy} shows the formal general definition of the Cross Entropy Loss.
And equation~\eqref{eq:crelbinary} is the special case of the general Cross Entropy Loss for binary classification tasks.
\begin{align}
H(p,q) &= -\sum_{x\in\mathcal{X}} p(x)\, \log q(x)\label{eq:crelformal}\\
H(p,q) &= - (p \log q + (1-p) \log(1-q))\label{eq:crelbinary}\\
\mathcal{L}(p,q) &= - \frac1N \sum_{i=1}^{\mathcal{B}} (p_i \log q_i + (1-p_i) \log(1-q_i))\label{eq:crelbinarybatch}
\end{align}
Equation~$\mathcal{L}(p,q)$~\eqref{eq:crelbinarybatch}\cite{handsonaiI} is the Binary Cross Entropy Loss for a batch of size $\mathcal{B}$ and used for model training in this Practical Work.
\subsubsection{Mathematical modeling of problem}\label{subsubsec:mathematicalmodeling}

View File

@ -1,37 +0,0 @@
%! Author = lukas
%! Date = 4/9/24
@InProceedings{crossentropy,
ISSN = {00359246},
URL = {http://www.jstor.org/stable/2984087},
abstract = {This paper deals first with the relationship between the theory of probability and the theory of rational behaviour. A method is then suggested for encouraging people to make accurate probability estimates, a connection with the theory of information being mentioned. Finally Wald's theory of statistical decision functions is summarised and generalised and its relation to the theory of rational behaviour is discussed.},
author = {I. J. Good},
journal = {Journal of the Royal Statistical Society. Series B (Methodological)},
number = {1},
pages = {107--114},
publisher = {[Royal Statistical Society, Wiley]},
title = {Rational Decisions},
urldate = {2024-05-23},
volume = {14},
year = {1952}
}
@misc{efficientADpaper,
title={EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies},
author={Kilian Batzner and Lars Heckler and Rebecca König},
year={2024},
eprint={2303.14535},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2303.14535},
}
@misc{patchcorepaper,
title={Towards Total Recall in Industrial Anomaly Detection},
author={Karsten Roth and Latha Pemula and Joaquin Zepeda and Bernhard Schölkopf and Thomas Brox and Peter Gehler},
year={2022},
eprint={2106.08265},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2106.08265},
}

View File

@ -1,5 +0,0 @@
= Conclusion and Outlook
== Conclusion
== Outlook

View File

@ -1,15 +0,0 @@
= Experimental Results
== Is Few-Shot learning a suitable fit for anomaly detection?
Should Few-Shot learning be used for anomaly detection tasks?
How does it compare to well established algorithms such as Patchcore or EfficientAD?
== How does disbalancing the Shot number affect performance?
Does giving the Few-Shot learner more good than bad samples improve the model performance?
== How does the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?
How much does the performance improve if only detecting an anomaly or not?
How does it compare to PatchCore and EfficientAD?
== Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?

View File

@ -1,16 +0,0 @@
= Implementation
== Experiment Setup
% todo
todo setup of experiments, which classes used, nr of samples
kinds of experiments which lead to graphs
== Jupyter
To get accurate performance measures the active-learning process was implemented in a Jupyter notebook first.
This helps to choose which of the methods performs the best and which one to use in the final Dagster pipeline.
A straight forward machine-learning pipeline was implemented with the help of Pytorch and RESNet-18.
Moreover, the Dataset was manually imported with the help of a custom torch dataloader and preprocessed with random augmentations.
After each loop iteration the Area Under the Curve (AUC) was calculated over the validation set to get a performance measure.
All those AUC were visualized in a line plot, see section~\ref{sec:experimental-results} for the results.

View File

@ -1,31 +0,0 @@
= Introduction
== Motivation
Anomaly detection has especially in the industrial and automotive field essential importance.
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
Machine learning helped the field to advance a lot in the past.
PatchCore and EfficientAD are state of the art algorithms trained only on good data and then detect anomalies within unseen (but similar) data.
One of their problems is the need of lots of training data and time to train.
Few-Shot learning might be a suitable alternative with essentially lowered train time.
In this thesis the performance of 3 Few-Shot learning algorithms will be compared in the field of anomaly detection.
Moreover, few-shot learning might be able not only to detect anomalies but also to detect the anomaly class.
== Research Questions
=== Is Few-Shot learning a suitable fit for anomaly detection?
Should Few-Shot learning be used for anomaly detection tasks?
How does it compare to well established algorithms such as Patchcore or EfficientAD?
=== How does disbalancing the Shot number affect performance?
Does giving the Few-Shot learner more good than bad samples improve the model performance?
=== How does the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?
How much does the performance improve if only detecting an anomaly or not?
How does it compare to PatchCore and EfficientAD?
=== Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?
// I've tried different distance measures $->$ but results are pretty much the same.
== Outline
todo

View File

@ -1,133 +0,0 @@
= Material and Methods
== Material
=== MVTec AD
MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection.
It contains over 5000 high-resolution images divided into fifteen different object and texture categories.
Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects.
#figure(
image("rsc/dataset_overview_large.png", width: 80%),
caption: [Architecture convolutional neural network. #cite(<datasetsampleimg>)],
) <datasetoverview>
// todo
Todo: descibe which categories are used in this bac and how many samples there are.
== Methods
=== Few-Shot Learning
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
So the model is prone to overfitting to the few training samples.
Typically a few-shot leaning task consists of a support and query set.
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
A common way to format a few-shot leaning problem is using n-way k-shot notation.
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.
A classical example of how such a model might work is a prototypical network.
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.
#figure(
image("rsc/prototype_fewshot_v3.png", width: 60%),
caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
) <prototypefewshot>
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
See //%todo link to this section
// todo proper source
=== Generalisation from few samples
An especially hard task is to generalize from such few samples.
In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
This helps the model to learn the underlying patterns and to generalize well to unseen data.
In few-shot learning the model has to generalize from just a few samples.
=== Patchcore
%todo also show values how they perform on MVTec AD
=== EfficientAD
todo stuff #cite(<patchcorepaper>)
// https://arxiv.org/pdf/2106.08265
todo stuff #cite(<efficientADpaper>)
// https://arxiv.org/pdf/2303.14535
=== Jupyter Notebook
A Jupyter notebook is a shareable document which combines code and its output, text and visualizations.
The notebook along with the editor provides a environment for fast prototyping and data analysis.
It is widely used in the data science, mathematics and machine learning community.
In the context of this bachelor thesis it was used to test and evaluate the three few-shot learning methods and to compare them. #cite(<jupyter>)
=== CNN
Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.
A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.
Convolutional layers are a set of learnable kernels (filters).
Each filter performs a convolution operation by sliding a window over every pixel of the image.
On each pixel a dot product creates a feature map.
Convolutional layers capture features like edges, textures or shapes.
Pooling layers sample down the feature maps created by the convolutional layers.
This helps reducing the computational complexity of the overall network and help with overfitting.
Common pooling layers include average- and max pooling.
Finally, after some convolution layers the feature map is flattened and passed to a network of fully connected layers to perform a classification or regression task.
@cnnarchitecture shows a typical binary classification task.
#cite(<cnnintro>)
#figure(
image("rsc/cnn_architecture.png", width: 80%),
caption: [Architecture convolutional neural network. #cite(<cnnarchitectureimg>)],
) <cnnarchitecture>
=== RESNet
Residual neural networks are a special type of neural network architecture.
They are especially good for deep learning and have been used in many state-of-the-art computer vision tasks.
The main idea behind ResNet is the skip connection.
The skip connection is a direct connection from one layer to another layer which is not the next layer.
This helps to avoid the vanishing gradient problem and helps with the training of very deep networks.
ResNet has proven to be very successful in many computer vision tasks and is used in this practical work for the classification task.
There are several different ResNet architectures, the most common are ResNet-18, ResNet-34, ResNet-50, ResNet-101 and ResNet-152. #cite(<resnet>)
For this bachelor theis the ResNet-50 architecture was used to predict the corresponding embeddings for the few-shot learning methods.
=== CAML
Todo
=== P$>$M$>$F
Todo
=== Softmax
The Softmax function @softmax #cite(<liang2017soft>) converts $n$ numbers of a vector into a probability distribution.
Its a generalization of the Sigmoid function and often used as an Activation Layer in neural networks.
$
sigma(bold(z))_j = (e^(z_j)) / (sum_(k=1)^k e^(z_k)) "for" j:={1,...,k}
$ <softmax>
The softmax function has high similarities with the Boltzmann distribution and was first introduced in the 19th century #cite(<Boltzmann>).
=== Cross Entropy Loss
Cross Entropy Loss is a well established loss function in machine learning.
Equation~\eqref{eq:crelformal}\cite{crossentropy} shows the formal general definition of the Cross Entropy Loss.
And equation~\eqref{eq:crelbinary} is the special case of the general Cross Entropy Loss for binary classification tasks.
$
H(p,q) &= -sum_(x in cal(X)) p(x) log q(x)\
H(p,q) &= -(p log(q) + (1-p) log(1-q))\
cal(L)(p,q) &= -1/N sum_(i=1)^(cal(B)) (p_i log(q_i) + (1-p_i) log(1-q_i))
$
Equation~$cal(L)(p,q)$~\eqref{eq:crelbinarybatch}\cite{handsonaiI} is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
=== Mathematical modeling of problem
== Alternative Methods
There are several alternative methods to few-shot learning which are not used in this bachelor thesis.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 94 KiB

View File

@ -1,11 +1,15 @@
#let inwriting = false
#let draft = false
#import "@preview/drafting:0.2.1": margin-note
#let inp = sys.inputs
#let inwriting = inp.at("inwriting", default: "true") == "true"
#let draft = inp.at("draft", default: "true") == "true"
#assert(not(inwriting and not(draft)), message: "If inwriting is true, draft should be true as well.")
#let todo(it) = [
#if inwriting [
#text(size: 0.8em)[#emoji.pencil] #text(it, fill: red, weight: 600)
#margin-note(it)
]
]