bachelor-thesis/experimentalresults.typ
lukas-heiligenbrunner 71bdb0a207
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 1m34s
fix some errors
2025-01-24 19:51:55 +01:00

108 lines
5.6 KiB
Typst

#import "utils.typ": todo, inwriting
#import "@preview/subpar:0.1.1"
= Experimental Results <sectionexperimentalresults>
== Is Few-Shot learning a suitable fit for anomaly detection? <expresults2way>
_Should Few-Shot learning be used for anomaly detection tasks?
How does it compare to well established algorithms such as Patchcore or EfficientAD?_
@comparison2waybottle shows the performance of the 2-way classification (anomaly or not) on the bottle class and @comparison2waycable the same on the cable class.
The performance values are the same as in @experiments but just merged together into one graph.
As a reference Patchcore reaches an AUROC score of 99.6% and EfficientAD reaches 99.8% averaged over all classes provided by the MVTec AD dataset.
Both are trained with samples from the 'good' class only.
So there is a clear performance gap between Few-Shot learning and the state of the art anomaly detection algorithms.
In the @comparison2way Patchcore and EfficientAD are not included as they aren't directly compareable in the same fashion.
That means if the goal is just to detect anomalies, Few-Shot learning is not the best choice and Patchcore or EfficientAD should be used.
#subpar.grid(
figure(image("rsc/comparison-2way-bottle.png"), caption: [
Bottle class
]), <comparison2waybottle>,
figure(image("rsc/comparison-2way-cable.png"), caption: [
Cable class
]), <comparison2waycable>,
columns: (1fr, 1fr),
caption: [2-Way classification performance],
label: <comparison2way>,
)
== How does disbalancing the Shot number affect performance?
_Does giving the Few-Shot learner more good than bad samples improve the model performance?_
As all three method results in @experiments show, the performance of the Few-Shot learner decreases with an increasing number of good samples.
Which is an result that is unexpected (since one can think more samples perform always better) but align with the idea that all classes should always be as balanced as possible.
@comparisoninbalanced shows the performance of the inbalanced classification on the bottle and cable class for all anomaly classes.
#subpar.grid(
figure(image("rsc/inbalanced-bottle.png"), caption: [
Bottle class
]), <comparisoninbalancedbottle>,
figure(image("rsc/inbalanced-cable.png"), caption: [
Cable class
]), <comparisoninbalancedcable>,
columns: (1fr, 1fr),
caption: [Inbalanced classification performance],
label: <comparisoninbalanced>,
)
@comparisoninbalanced2way shows the performance of the inbalanced classification for just being an anomaly or not on the bottle and cable class.
#subpar.grid(
figure(image("rsc/inbalanced-2way-bottle.png"), caption: [
Bottle class
]), <comparisoninbalanced2waybottle>,
figure(image("rsc/inbalanced-2way-cable.png"), caption: [
Cable class
]), <comparisoninbalanced2waycable>,
columns: (1fr, 1fr),
caption: [Inbalanced 2-Way classification performance],
label: <comparisoninbalanced2way>,
)
Clearly all four graphs show that the performance decreases with an increasing number of good samples.
So the conclusion is that the Few-Shot learner should always be trained with as balanced classes as possible.
== How do the 3 (ResNet, CAML, P>M>F) methods perform in only detecting the anomaly class?
_How much does the performance improve by only detecting the presence of an anomaly?
How does it compare to PatchCore and EfficientAD#todo[Maybe remove comparion?]?_
@comparisonnormal shows graphs comparing the performance of the ResNet, CAML and P>M>F methods in detecting the anomaly class only including the good class as well as excluding the good class.
P>M>F performs in almost all cases better than ResNet and CAML.
P>M>F reaches up to 78% accuracy in the bottle class (@comparisonnormalbottle) and 46% in the cable class (@comparisonnormalcable) when detecting all classes including good ones
and 84% in the bottle class (@comparisonfaultyonlybottle) and 51% in the cable class (@comparisonfaultyonlycable) when excluding the good class.
Those results are pretty good when considering the few amount of samples and how similar the anomaly classes actually are.
CAML performes the worst in all cases except for the cable class when detecting all classes except the good one.
This might be the case because it is not fine-tuned on the shots and not really built for such similar classes.
The detection is not really compareable with PatchCore and EfficientAD as they are trained on the good class only.
And they are built for detecting anomalies in general and not the anomaly classes.
Have a look at @expresults2way for a comparison of the 2-way classification performance.
So in conclusion it's a good idea to use P>M>F for detecting the anomaly classes only.
Especially when there are not many samples of the anomaly classes available such as in most anomaly detection scenarios.
One could use a well established algorithm like PatchCore or EfficientAD for detecting anomalies in general and P>M>F for detecting the anomaly class afterwards.
#subpar.grid(
figure(image("rsc/normal-bottle.png"), caption: [
5-Way - Bottle class
]), <comparisonnormalbottle>,
figure(image("rsc/normal-cable.png"), caption: [
9-Way - Cable class
]), <comparisonnormalcable>,
figure(image("rsc/faultclasses-bottle.png"), caption: [
4-Way - Bottle class
]), <comparisonfaultyonlybottle>,
figure(image("rsc/faultclasses-cable.png"), caption: [
8-Way - Cable class
]), <comparisonfaultyonlycable>,
columns: (1fr, 1fr),
caption: [Nomaly class only classification performance],
label: <comparisonnormal>,
)
#if inwriting [
== Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?
#todo[Maybe don't do this]
]