make suggested typo changes
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 33s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 33s
This commit is contained in:
parent
0da616107f
commit
af58cda976
@ -145,7 +145,7 @@ FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~@pmfpaper]
|
|||||||
of diverse domains by the authors of the original paper.~@pmfpaper
|
of diverse domains by the authors of the original paper.~@pmfpaper
|
||||||
|
|
||||||
Finally, this model is finetuned with the support set of every test iteration.
|
Finally, this model is finetuned with the support set of every test iteration.
|
||||||
Everytime the support set changes we need to finetune the model again.
|
Every time the support set changes, we need to finetune the model again.
|
||||||
In a real world scenario this should not be the case because the support set is fixed and only the query set changes.
|
In a real world scenario this should not be the case because the support set is fixed and only the query set changes.
|
||||||
|
|
||||||
=== Results
|
=== Results
|
||||||
@ -188,7 +188,7 @@ This brings the limitation that it can only process default few-shot learning ta
|
|||||||
Since it expects the input sequence to be distributed with the same number of shots per class.
|
Since it expects the input sequence to be distributed with the same number of shots per class.
|
||||||
This is the reason why for this method the two imbalanced test cases couldn't be conducted.
|
This is the reason why for this method the two imbalanced test cases couldn't be conducted.
|
||||||
|
|
||||||
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
|
As a feature extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
|
||||||
This feature extractor was already pretrained when used by the authors of the original paper.
|
This feature extractor was already pretrained when used by the authors of the original paper.
|
||||||
In this case for the non-causal sequence model a transformer model was used.
|
In this case for the non-causal sequence model a transformer model was used.
|
||||||
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
|
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
|
||||||
@ -201,7 +201,7 @@ The model was trained on a large number of general purpose images and is not fin
|
|||||||
Moreover, it was not fine-tuned on the support set similar to the P>M>F method, which could have a huge impact on performance.
|
Moreover, it was not fine-tuned on the support set similar to the P>M>F method, which could have a huge impact on performance.
|
||||||
It might also not handle very similar images well.
|
It might also not handle very similar images well.
|
||||||
|
|
||||||
Compared the the other two methods CAML performed poorly in almost all experiments.
|
Compared the the other two methods, CAML performed poorly in almost all experiments.
|
||||||
The normal few-shot classification reached only 40% accuracy in @camlperfa at best.
|
The normal few-shot classification reached only 40% accuracy in @camlperfa at best.
|
||||||
The only test it did surprisingly well was the detection of the anomaly class for the cable class in @camlperfb were it reached almost 60% accuracy.
|
The only test it did surprisingly well was the detection of the anomaly class for the cable class in @camlperfb were it reached almost 60% accuracy.
|
||||||
|
|
||||||
|
@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
= Introduction
|
= Introduction
|
||||||
== Motivation
|
== Motivation
|
||||||
Anomaly detection has especially in the industrial and automotive field essential importance.
|
Anomaly detection is of essential importance, especially in the industrial and automotive field.
|
||||||
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
|
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
|
||||||
Machine learning helped the field to advance a lot in the past.
|
Machine learning helped the field to advance a lot in the past.
|
||||||
Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available.
|
Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available.
|
||||||
|
2
main.typ
2
main.typ
@ -43,7 +43,7 @@
|
|||||||
#show: jku-thesis.with(
|
#show: jku-thesis.with(
|
||||||
thesis-type: "Bachelor",
|
thesis-type: "Bachelor",
|
||||||
degree: "Bachelor of Science",
|
degree: "Bachelor of Science",
|
||||||
program: "Artifical Intelligence",
|
program: "Artificial Intelligence",
|
||||||
supervisor: "Josef Scharinger, a.Univ.-Prof, Dr.",
|
supervisor: "Josef Scharinger, a.Univ.-Prof, Dr.",
|
||||||
advisors: (), // singular advisor like this: ("Dr. Felix Pawsworth",) and no supervisor: ""
|
advisors: (), // singular advisor like this: ("Dr. Felix Pawsworth",) and no supervisor: ""
|
||||||
department: "Institute of Computational Perception",
|
department: "Institute of Computational Perception",
|
||||||
|
@ -76,7 +76,7 @@ In contrast to traditional supervised learning, where a huge amount of labeled d
|
|||||||
here we only have 1-10 samples per class (so called shots).
|
here we only have 1-10 samples per class (so called shots).
|
||||||
So the model is prone to overfitting to the few training samples and this means they should represent the whole sample distribution as good as possible.~#cite(<parnami2022learningexamplessummaryapproaches>)
|
So the model is prone to overfitting to the few training samples and this means they should represent the whole sample distribution as good as possible.~#cite(<parnami2022learningexamplessummaryapproaches>)
|
||||||
|
|
||||||
Typically a few-shot leaning task consists of a support and query set.
|
Typically, a few-shot leaning task consists of a support and query set.
|
||||||
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
|
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
|
||||||
A common way to format a few-shot leaning problem is using n-way k-shot notation.
|
A common way to format a few-shot leaning problem is using n-way k-shot notation.
|
||||||
For Example, 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper
|
For Example, 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper
|
||||||
@ -89,7 +89,7 @@ These models learn a representation of each class in a reduced dimensionality an
|
|||||||
caption: [Prototypical network for 3-ways and 5-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
|
caption: [Prototypical network for 3-ways and 5-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
|
||||||
) <prototypefewshot>
|
) <prototypefewshot>
|
||||||
|
|
||||||
The first and easiest method of this bachelor thesis uses a simple ResNet50 to calucalte those embeddings and clusters the shots together by calculating the class center.
|
The first and easiest method of this bachelor thesis uses a simple ResNet50 to calculate those embeddings and clusters the shots together by calculating the class center.
|
||||||
This is basically a simple prototypical network.
|
This is basically a simple prototypical network.
|
||||||
See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust
|
See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust
|
||||||
|
|
||||||
@ -186,7 +186,7 @@ This lowers computational costs while maintaining detection accuracy.~#cite(<pat
|
|||||||
EfficientAD is another state of the art method for anomaly detection.
|
EfficientAD is another state of the art method for anomaly detection.
|
||||||
It focuses on maintaining performance as well as high computational efficiency.
|
It focuses on maintaining performance as well as high computational efficiency.
|
||||||
At its core, EfficientAD uses a lightweight feature extractor, the Patch Description Network (PDN), which processes images in less than a millisecond on modern hardware.
|
At its core, EfficientAD uses a lightweight feature extractor, the Patch Description Network (PDN), which processes images in less than a millisecond on modern hardware.
|
||||||
In comparison to Patchcore, which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convulutional layers and two pooling layers.
|
In comparison to Patchcore, which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convolutional layers and two pooling layers.
|
||||||
This results in reduced latency while retaining the ability to generate patch-level features.~#cite(<efficientADpaper>)
|
This results in reduced latency while retaining the ability to generate patch-level features.~#cite(<efficientADpaper>)
|
||||||
#todo[reference to image below]
|
#todo[reference to image below]
|
||||||
|
|
||||||
@ -283,7 +283,7 @@ If a novel task is drawn from an unseen domain the model may fail to generalize
|
|||||||
To overcome this the model is optionally fine-tuned with the support set on a few gradient steps.
|
To overcome this the model is optionally fine-tuned with the support set on a few gradient steps.
|
||||||
Data augmentation is used to generate a pseudo query set.
|
Data augmentation is used to generate a pseudo query set.
|
||||||
With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set.
|
With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set.
|
||||||
With the loss of this step the whole model is fine-tuned to the new domain.~#cite(<pmfpaper>)
|
During this step, the entire model is fine-tuned to the new domain.~#cite(<pmfpaper>)
|
||||||
|
|
||||||
#figure(
|
#figure(
|
||||||
image("rsc/pmfarchitecture.png", width: 100%),
|
image("rsc/pmfarchitecture.png", width: 100%),
|
||||||
@ -400,7 +400,7 @@ Also, fine-tuning the model can require considerable computational resources, wh
|
|||||||
// https://arxiv.org/pdf/2208.10559v1
|
// https://arxiv.org/pdf/2208.10559v1
|
||||||
// https://arxiv.org/abs/2208.10559v1
|
// https://arxiv.org/abs/2208.10559v1
|
||||||
|
|
||||||
TRIDENT, a variational infernce network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
|
TRIDENT, a variational inference network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
|
||||||
Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification.
|
Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification.
|
||||||
By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>)
|
By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>)
|
||||||
|
|
||||||
@ -427,7 +427,7 @@ The transform features parameterless-ness, which makes it easy to integrate into
|
|||||||
It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT.
|
It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT.
|
||||||
SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||||
|
|
||||||
The improvements of SOT over traditional feature transforms dpeend on the used backbone network and the task.
|
The improvements of SOT over traditional feature transforms depend on the used backbone network and the task.
|
||||||
But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
|
||||||
|
|
||||||
// anomaly detect
|
// anomaly detect
|
||||||
|
Loading…
x
Reference in New Issue
Block a user