make suggested typo changes
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 33s

This commit is contained in:
lukas-heiligenbrunner 2025-01-25 11:31:50 +01:00
parent 0da616107f
commit af58cda976
4 changed files with 11 additions and 11 deletions

View File

@ -145,7 +145,7 @@ FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~@pmfpaper]
of diverse domains by the authors of the original paper.~@pmfpaper of diverse domains by the authors of the original paper.~@pmfpaper
Finally, this model is finetuned with the support set of every test iteration. Finally, this model is finetuned with the support set of every test iteration.
Everytime the support set changes we need to finetune the model again. Every time the support set changes, we need to finetune the model again.
In a real world scenario this should not be the case because the support set is fixed and only the query set changes. In a real world scenario this should not be the case because the support set is fixed and only the query set changes.
=== Results === Results
@ -188,7 +188,7 @@ This brings the limitation that it can only process default few-shot learning ta
Since it expects the input sequence to be distributed with the same number of shots per class. Since it expects the input sequence to be distributed with the same number of shots per class.
This is the reason why for this method the two imbalanced test cases couldn't be conducted. This is the reason why for this method the two imbalanced test cases couldn't be conducted.
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16. As a feature extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
This feature extractor was already pretrained when used by the authors of the original paper. This feature extractor was already pretrained when used by the authors of the original paper.
In this case for the non-causal sequence model a transformer model was used. In this case for the non-causal sequence model a transformer model was used.
It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096. It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096.
@ -201,7 +201,7 @@ The model was trained on a large number of general purpose images and is not fin
Moreover, it was not fine-tuned on the support set similar to the P>M>F method, which could have a huge impact on performance. Moreover, it was not fine-tuned on the support set similar to the P>M>F method, which could have a huge impact on performance.
It might also not handle very similar images well. It might also not handle very similar images well.
Compared the the other two methods CAML performed poorly in almost all experiments. Compared the the other two methods, CAML performed poorly in almost all experiments.
The normal few-shot classification reached only 40% accuracy in @camlperfa at best. The normal few-shot classification reached only 40% accuracy in @camlperfa at best.
The only test it did surprisingly well was the detection of the anomaly class for the cable class in @camlperfb were it reached almost 60% accuracy. The only test it did surprisingly well was the detection of the anomaly class for the cable class in @camlperfb were it reached almost 60% accuracy.

View File

@ -2,7 +2,7 @@
= Introduction = Introduction
== Motivation == Motivation
Anomaly detection has especially in the industrial and automotive field essential importance. Anomaly detection is of essential importance, especially in the industrial and automotive field.
Lots of assembly lines need visual inspection to find errors often with the help of camera systems. Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
Machine learning helped the field to advance a lot in the past. Machine learning helped the field to advance a lot in the past.
Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available. Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available.

View File

@ -43,7 +43,7 @@
#show: jku-thesis.with( #show: jku-thesis.with(
thesis-type: "Bachelor", thesis-type: "Bachelor",
degree: "Bachelor of Science", degree: "Bachelor of Science",
program: "Artifical Intelligence", program: "Artificial Intelligence",
supervisor: "Josef Scharinger, a.Univ.-Prof, Dr.", supervisor: "Josef Scharinger, a.Univ.-Prof, Dr.",
advisors: (), // singular advisor like this: ("Dr. Felix Pawsworth",) and no supervisor: "" advisors: (), // singular advisor like this: ("Dr. Felix Pawsworth",) and no supervisor: ""
department: "Institute of Computational Perception", department: "Institute of Computational Perception",

View File

@ -76,7 +76,7 @@ In contrast to traditional supervised learning, where a huge amount of labeled d
here we only have 1-10 samples per class (so called shots). here we only have 1-10 samples per class (so called shots).
So the model is prone to overfitting to the few training samples and this means they should represent the whole sample distribution as good as possible.~#cite(<parnami2022learningexamplessummaryapproaches>) So the model is prone to overfitting to the few training samples and this means they should represent the whole sample distribution as good as possible.~#cite(<parnami2022learningexamplessummaryapproaches>)
Typically a few-shot leaning task consists of a support and query set. Typically, a few-shot leaning task consists of a support and query set.
Where the support-set contains the training data and the query set the evaluation data for real world evaluation. Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
A common way to format a few-shot leaning problem is using n-way k-shot notation. A common way to format a few-shot leaning problem is using n-way k-shot notation.
For Example, 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper For Example, 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper
@ -89,7 +89,7 @@ These models learn a representation of each class in a reduced dimensionality an
caption: [Prototypical network for 3-ways and 5-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)], caption: [Prototypical network for 3-ways and 5-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
) <prototypefewshot> ) <prototypefewshot>
The first and easiest method of this bachelor thesis uses a simple ResNet50 to calucalte those embeddings and clusters the shots together by calculating the class center. The first and easiest method of this bachelor thesis uses a simple ResNet50 to calculate those embeddings and clusters the shots together by calculating the class center.
This is basically a simple prototypical network. This is basically a simple prototypical network.
See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust
@ -186,7 +186,7 @@ This lowers computational costs while maintaining detection accuracy.~#cite(<pat
EfficientAD is another state of the art method for anomaly detection. EfficientAD is another state of the art method for anomaly detection.
It focuses on maintaining performance as well as high computational efficiency. It focuses on maintaining performance as well as high computational efficiency.
At its core, EfficientAD uses a lightweight feature extractor, the Patch Description Network (PDN), which processes images in less than a millisecond on modern hardware. At its core, EfficientAD uses a lightweight feature extractor, the Patch Description Network (PDN), which processes images in less than a millisecond on modern hardware.
In comparison to Patchcore, which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convulutional layers and two pooling layers. In comparison to Patchcore, which relies on a deeper, more computationaly heavy WideResNet-101 network, the PDN uses only four convolutional layers and two pooling layers.
This results in reduced latency while retaining the ability to generate patch-level features.~#cite(<efficientADpaper>) This results in reduced latency while retaining the ability to generate patch-level features.~#cite(<efficientADpaper>)
#todo[reference to image below] #todo[reference to image below]
@ -283,7 +283,7 @@ If a novel task is drawn from an unseen domain the model may fail to generalize
To overcome this the model is optionally fine-tuned with the support set on a few gradient steps. To overcome this the model is optionally fine-tuned with the support set on a few gradient steps.
Data augmentation is used to generate a pseudo query set. Data augmentation is used to generate a pseudo query set.
With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set. With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set.
With the loss of this step the whole model is fine-tuned to the new domain.~#cite(<pmfpaper>) During this step, the entire model is fine-tuned to the new domain.~#cite(<pmfpaper>)
#figure( #figure(
image("rsc/pmfarchitecture.png", width: 100%), image("rsc/pmfarchitecture.png", width: 100%),
@ -400,7 +400,7 @@ Also, fine-tuning the model can require considerable computational resources, wh
// https://arxiv.org/pdf/2208.10559v1 // https://arxiv.org/pdf/2208.10559v1
// https://arxiv.org/abs/2208.10559v1 // https://arxiv.org/abs/2208.10559v1
TRIDENT, a variational infernce network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables. TRIDENT, a variational inference network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification. Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification.
By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>) By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>)
@ -427,7 +427,7 @@ The transform features parameterless-ness, which makes it easy to integrate into
It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT. It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT.
SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>) SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
The improvements of SOT over traditional feature transforms dpeend on the used backbone network and the task. The improvements of SOT over traditional feature transforms depend on the used backbone network and the task.
But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>) But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
// anomaly detect // anomaly detect