make suggested typo changes
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 33s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 33s
This commit is contained in:
parent
0da616107f
commit
af58cda976
@ -145,7 +145,7 @@ FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~@pmfpaper]
|
||||
of diverse domains by the authors of the original paper.~@pmfpaper
|
||||
|
||||
Finally, this model is finetuned with the support set of every test iteration.
|
||||
Everytime the support set changes we need to finetune the model again.
|
||||
Every time the support set changes, we need to finetune the model again.
|
||||
In a real world scenario this should not be the case because the support set is fixed and only the query set changes.
|
||||
|
||||
=== Results
|
||||
@ -188,7 +188,7 @@ This brings the limitation that it can only process default few-shot learning ta
|
||||
Since it expects the input sequence to be distributed with the same number of shots per class.
|
||||
This is the reason why for this method the two imbalanced test cases couldn't be conducted.
|
||||
|
||||
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
|
||||
As a feature extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
|
||||
This feature extractor was already pretrained when used by the authors of the original paper.
|
||||
In this case for the non-causal sequence model a transformer model was used.
|
||||
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
|
||||
@ -201,7 +201,7 @@ The model was trained on a large number of general purpose images and is not fin
|
||||
Moreover, it was not fine-tuned on the support set similar to the P>M>F method, which could have a huge impact on performance.
|
||||
It might also not handle very similar images well.
|
||||
|
||||
Compared the the other two methods CAML performed poorly in almost all experiments.
|
||||
Compared the the other two methods, CAML performed poorly in almost all experiments.
|
||||
The normal few-shot classification reached only 40% accuracy in @camlperfa at best.
|
||||
The only test it did surprisingly well was the detection of the anomaly class for the cable class in @camlperfb were it reached almost 60% accuracy.
|
||||
|
||||
|
@ -2,7 +2,7 @@
|
||||
|
||||
= Introduction
|
||||
== Motivation
|
||||
Anomaly detection has especially in the industrial and automotive field essential importance.
|
||||
Anomaly detection is of essential importance, especially in the industrial and automotive field.
|
||||
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
|
||||
Machine learning helped the field to advance a lot in the past.
|
||||
Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available.
|
||||
|
2
main.typ
2
main.typ
@ -43,7 +43,7 @@
|
||||
#show: jku-thesis.with(
|
||||
thesis-type: "Bachelor",
|
||||
degree: "Bachelor of Science",
|
||||
program: "Artifical Intelligence",
|
||||
program: "Artificial Intelligence",
|
||||
supervisor: "Josef Scharinger, a.Univ.-Prof, Dr.",
|
||||
advisors: (), // singular advisor like this: ("Dr. Felix Pawsworth",) and no supervisor: ""
|
||||
department: "Institute of Computational Perception",
|
||||
|
@ -76,7 +76,7 @@ In contrast to traditional supervised learning, where a huge amount of labeled d
|
||||
here we only have 1-10 samples per class (so called shots).
|
||||
So the model is prone to overfitting to the few training samples and this means they should represent the whole sample distribution as good as possible.~#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||
|
||||
Typically a few-shot leaning task consists of a support and query set.
|
||||
Typically, a few-shot leaning task consists of a support and query set.
|
||||
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
|
||||
A common way to format a few-shot leaning problem is using n-way k-shot notation.
|
||||
For Example, 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper
|
||||
@ -89,7 +89,7 @@ These models learn a representation of each class in a reduced dimensionality an
|
||||
caption: [Prototypical network for 3-ways and 5-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
|
||||
) <prototypefewshot>
|
||||
|
||||
The first and easiest method of this bachelor thesis uses a simple ResNet50 to calucalte those embeddings and clusters the shots together by calculating the class center.
|
||||
The first and easiest method of this bachelor thesis uses a simple ResNet50 to calculate those embeddings and clusters the shots together by calculating the class center.
|
||||
This is basically a simple prototypical network.
|
||||
See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust
|
||||
|
||||
@ -186,7 +186,7 @@ This lowers computational costs while maintaining detection accuracy.~#cite(<pat
|
||||
EfficientAD is another state of the art method for anomaly detection.
|
||||
It focuses on maintaining performance as well as high computational efficiency.
|
||||
At its core, EfficientAD uses a lightweight feature extractor, the Patch Description Network (PDN), which processes images in less than a millisecond on modern hardware.
|
||||
In comparison to Patchcore, which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convulutional layers and two pooling layers.
|
||||
In comparison to Patchcore, which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convolutional layers and two pooling layers.
|
||||
This results in reduced latency while retaining the ability to generate patch-level features.~#cite(<efficientADpaper>)
|
||||
#todo[reference to image below]
|
||||
|
||||
@ -283,7 +283,7 @@ If a novel task is drawn from an unseen domain the model may fail to generalize
|
||||
To overcome this the model is optionally fine-tuned with the support set on a few gradient steps.
|
||||
Data augmentation is used to generate a pseudo query set.
|
||||
With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set.
|
||||
With the loss of this step the whole model is fine-tuned to the new domain.~#cite(<pmfpaper>)
|
||||
During this step, the entire model is fine-tuned to the new domain.~#cite(<pmfpaper>)
|
||||
|
||||
#figure(
|
||||
image("rsc/pmfarchitecture.png", width: 100%),
|
||||
@ -400,7 +400,7 @@ Also, fine-tuning the model can require considerable computational resources, wh
|
||||
// https://arxiv.org/pdf/2208.10559v1
|
||||
// https://arxiv.org/abs/2208.10559v1
|
||||
|
||||
TRIDENT, a variational infernce network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
|
||||
TRIDENT, a variational inference network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
|
||||
Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification.
|
||||
By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>)
|
||||
|
||||
@ -427,7 +427,7 @@ The transform features parameterless-ness, which makes it easy to integrate into
|
||||
It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT.
|
||||
SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
The improvements of SOT over traditional feature transforms dpeend on the used backbone network and the task.
|
||||
The improvements of SOT over traditional feature transforms depend on the used backbone network and the task.
|
||||
But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||
|
||||
// anomaly detect
|
||||
|
Loading…
x
Reference in New Issue
Block a user