From af58cda976b60597a1153d924f237674aad2b769 Mon Sep 17 00:00:00 2001 From: lukas-heiligenbrunner Date: Sat, 25 Jan 2025 11:31:50 +0100 Subject: [PATCH] make suggested typo changes --- implementation.typ | 6 +++--- introduction.typ | 2 +- main.typ | 2 +- materialandmethods.typ | 12 ++++++------ 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/implementation.typ b/implementation.typ index 19e41d1..223c5d8 100644 --- a/implementation.typ +++ b/implementation.typ @@ -145,7 +145,7 @@ FGVCx Fungi, VGG Flower, Traffic Signs and MSCOCO~@pmfpaper] of diverse domains by the authors of the original paper.~@pmfpaper Finally, this model is finetuned with the support set of every test iteration. -Everytime the support set changes we need to finetune the model again. +Every time the support set changes, we need to finetune the model again. In a real world scenario this should not be the case because the support set is fixed and only the query set changes. === Results @@ -188,7 +188,7 @@ This brings the limitation that it can only process default few-shot learning ta Since it expects the input sequence to be distributed with the same number of shots per class. This is the reason why for this method the two imbalanced test cases couldn't be conducted. -As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16. +As a feature extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16. This feature extractor was already pretrained when used by the authors of the original paper. In this case for the non-causal sequence model a transformer model was used. It consists of 24 Layers with 16 Attention-heads and a hidden dimension of 1024 and output MLP size of 4096. @@ -201,7 +201,7 @@ The model was trained on a large number of general purpose images and is not fin Moreover, it was not fine-tuned on the support set similar to the P>M>F method, which could have a huge impact on performance. It might also not handle very similar images well. -Compared the the other two methods CAML performed poorly in almost all experiments. +Compared the the other two methods, CAML performed poorly in almost all experiments. The normal few-shot classification reached only 40% accuracy in @camlperfa at best. The only test it did surprisingly well was the detection of the anomaly class for the cable class in @camlperfb were it reached almost 60% accuracy. diff --git a/introduction.typ b/introduction.typ index a813916..b0bd44a 100644 --- a/introduction.typ +++ b/introduction.typ @@ -2,7 +2,7 @@ = Introduction == Motivation -Anomaly detection has especially in the industrial and automotive field essential importance. +Anomaly detection is of essential importance, especially in the industrial and automotive field. Lots of assembly lines need visual inspection to find errors often with the help of camera systems. Machine learning helped the field to advance a lot in the past. Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available. diff --git a/main.typ b/main.typ index 33a6c8c..2b6d7c0 100644 --- a/main.typ +++ b/main.typ @@ -43,7 +43,7 @@ #show: jku-thesis.with( thesis-type: "Bachelor", degree: "Bachelor of Science", - program: "Artifical Intelligence", + program: "Artificial Intelligence", supervisor: "Josef Scharinger, a.Univ.-Prof, Dr.", advisors: (), // singular advisor like this: ("Dr. Felix Pawsworth",) and no supervisor: "" department: "Institute of Computational Perception", diff --git a/materialandmethods.typ b/materialandmethods.typ index dcdab6d..17c0983 100644 --- a/materialandmethods.typ +++ b/materialandmethods.typ @@ -76,7 +76,7 @@ In contrast to traditional supervised learning, where a huge amount of labeled d here we only have 1-10 samples per class (so called shots). So the model is prone to overfitting to the few training samples and this means they should represent the whole sample distribution as good as possible.~#cite() -Typically a few-shot leaning task consists of a support and query set. +Typically, a few-shot leaning task consists of a support and query set. Where the support-set contains the training data and the query set the evaluation data for real world evaluation. A common way to format a few-shot leaning problem is using n-way k-shot notation. For Example, 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper @@ -89,7 +89,7 @@ These models learn a representation of each class in a reduced dimensionality an caption: [Prototypical network for 3-ways and 5-shots. #cite()], ) -The first and easiest method of this bachelor thesis uses a simple ResNet50 to calucalte those embeddings and clusters the shots together by calculating the class center. +The first and easiest method of this bachelor thesis uses a simple ResNet50 to calculate those embeddings and clusters the shots together by calculating the class center. This is basically a simple prototypical network. See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust @@ -186,7 +186,7 @@ This lowers computational costs while maintaining detection accuracy.~#cite() #todo[reference to image below] @@ -283,7 +283,7 @@ If a novel task is drawn from an unseen domain the model may fail to generalize To overcome this the model is optionally fine-tuned with the support set on a few gradient steps. Data augmentation is used to generate a pseudo query set. With the support set the class prototypes are calculated and compared against the models predictions for the pseudo query set. -With the loss of this step the whole model is fine-tuned to the new domain.~#cite() +During this step, the entire model is fine-tuned to the new domain.~#cite() #figure( image("rsc/pmfarchitecture.png", width: 100%), @@ -400,7 +400,7 @@ Also, fine-tuning the model can require considerable computational resources, wh // https://arxiv.org/pdf/2208.10559v1 // https://arxiv.org/abs/2208.10559v1 -TRIDENT, a variational infernce network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables. +TRIDENT, a variational inference network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables. Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification. By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite() @@ -427,7 +427,7 @@ The transform features parameterless-ness, which makes it easy to integrate into It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT. SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite() -The improvements of SOT over traditional feature transforms dpeend on the used backbone network and the task. +The improvements of SOT over traditional feature transforms depend on the used backbone network and the task. But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite() // anomaly detect