Compare commits

...

10 Commits

Author SHA1 Message Date
71bdb0a207 fix some errors
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 1m34s
2025-01-24 19:51:55 +01:00
8f28a8c387 use ieee citation style
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 18s
2025-01-15 16:02:26 +01:00
a1b8d7d81a update supervisor title
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 10s
2025-01-15 06:35:17 +00:00
c5bd509f24 fix some todos and spelling errors
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 21s
2025-01-15 07:03:10 +01:00
30d09a67d2 fix caml stuff and add things to last sec
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 11s
2025-01-14 20:05:11 +01:00
3e440e97f7 add stuff why inbalanced doesn't work for caml
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 13s
2025-01-14 19:39:41 +01:00
49d5e97417 add abstract, finish the alternatvie methods and fix some todos and improve sources
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 21s
2025-01-14 19:22:15 +01:00
7c54e11238 add sgva clip to not used materials
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 13s
2025-01-13 22:36:44 +01:00
dd1f28a89f add some things to matmethods
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 22s
2025-01-13 15:09:53 +01:00
7b5be51446 add some things to matmethods 2025-01-13 15:09:43 +01:00
7 changed files with 192 additions and 48 deletions

View File

@ -6,14 +6,17 @@ The only benefit of Few-Shot learning is that it can be used in environments whe
But this should not be the case in most scenarios.
Most of the time plenty of good samples are available and in this case Patchcore or EfficientAD should perform great.
The only case where Few-Shot learning could be used is in a scenario where one wants to detect the anomaly class itself.
Patchcore and EfficientAD can only detect if an anomaly is present or not but not what the anomaly is.
The only case where Few-Shot learning could be used is in a scenarios where one wants to detect the anomaly class itself.
Patchcore and EfficientAD can only detect if an anomaly is present or not but not what type of anomaly it actually is.
So chaining a Few-Shot learner after Patchcore or EfficientAD could be a good idea to use the best of both worlds.
In most of the tests performed P>M>F performed the best.
In most of the tests P>M>F performed the best.
But also the simple ResNet50 method performed better than expected in most cases and can be considered if the computational resources are limited and if a simple architecture is enough.
== Outlook
In the future when new Few-Shot learning methods evolve it could be interesting to test again how they perform in anomaly detection tasks.
There might be a lack of research in the area where the classes to detect are very similar to each other
and when building a few-shot learning algorithm tailored specifically for very similar classes this could boost the performance by a large margin.
It might be interesting to test the SOT method (see @SOT) with a ResNet50 feature extractor similar as proposed in this thesis but with SOT for embedding comparison.
Moreover, TRIDENT (see @TRIDENT) could achive promising results in a anomaly detection scenario.

View File

@ -64,8 +64,8 @@ Which is an result that is unexpected (since one can think more samples perform
Clearly all four graphs show that the performance decreases with an increasing number of good samples.
So the conclusion is that the Few-Shot learner should always be trained with as balanced classes as possible.
== How does the 3 (ResNet, CAML, pmf) methods perform in only detecting the anomaly class?
_How much does the performance improve if only detecting an anomaly or not?
== How do the 3 (ResNet, CAML, P>M>F) methods perform in only detecting the anomaly class?
_How much does the performance improve by only detecting the presence of an anomaly?
How does it compare to PatchCore and EfficientAD#todo[Maybe remove comparion?]?_
@comparisonnormal shows graphs comparing the performance of the ResNet, CAML and P>M>F methods in detecting the anomaly class only including the good class as well as excluding the good class.

View File

@ -7,15 +7,19 @@
The three methods described (ResNet50, CAML, P>M>F) were implemented in a Jupyter notebook and compared to each other.
== Experiments <experiments>
For all of the three methods we test the following use-cases:#todo[maybe write more to each test]
For all of the three methods we test the following use-cases:
- Detection of anomaly class (1,3,5 shots)
- Every faulty class and the good class is detected.
- 2 Way classification (1,3,5 shots)
- Only faulty or not faulty is detected. All the samples of the faulty classes are treated as a single class.
- Detect only anomaly classes (1,3,5 shots)
- Similar to the first test but without the good class. Only faulty classes are detected.
- Inbalanced 2 Way classification (5,10,15,30 good shots, 5 bad shots)
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)
Those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
- Similar to the 2 way classification but with an inbalanced number of good shots.
- Inbalanced target class prediction (5,10,15,30 good shots, 5 bad shots)#todo[Avoid bullet points and write flow text?]
- Detect only the faulty classes without the good classed with an inbalanced number of shots.
All those experiments were conducted on the MVTEC AD dataset on the bottle and cable classes.
== Experiment Setup
All the experiments were done on the bottle and cable classes of the MVTEC AD dataset.
@ -23,20 +27,21 @@ The correspoinding number of shots were randomly selected from the dataset.
The rest of the images was used to test the model and measure the accuracy.
#todo[Maybe add real number of samples per classes]
== ResNet50
== ResNet50 <resnet50impl>
=== Approach
The simplest approach is to use a pre-trained ResNet50 model as a feature extractor.
From both the support and query set the features are extracted to get a downprojected representation of the images.
The support set embeddings are compared to the query set embeddings.
To predict the class of a query the class with the smallest distance to the support embedding is chosen.
If there are more than one support embedding within the same class the mean of those embeddings is used (class center).
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning.
This approach is similar to a prototypical network @snell2017prototypicalnetworksfewshotlearning and the work of _Just Use a Library of Pre-trained Feature
Extractors and a Simple Classifier_ @chowdhury2021fewshotimageclassificationjust but just with a simple distance metric instead of a neural net.
In this bachelor thesis a pre-trained ResNet50 (IMAGENET1K_V2) pytorch model was used.
It is pretrained on the imagenet dataset and has 50 residual layers.
To get the embeddings the last layer of the model was removed and the output of the second last layer was used as embedding output.
In the following diagram the ResNet50 architecture is visualized and the cut-point is marked.
In the following diagram the ResNet50 architecture is visualized and the cut-point is marked.~@chowdhury2021fewshotimageclassificationjust
#diagram(
spacing: (5mm, 5mm),
@ -146,7 +151,7 @@ In a real world scenario this should not be the case because the support set is
=== Results
The results of P>M>F look very promising and improve by a large margin over the ResNet50 method.
In @pmfbottleperfa the model reached an accuracy of 79% with 5 shots / 4 way classification.
The 2 way classification (faulty or not) performed even better and peaked at 94% accuracy with 5 shots.#todo[Add somehow that all classes are stacked]
The 2 way classification (faulty or not) performed even better and peaked at 94% accuracy with 5 shots.
Similar to the ResNet50 method in @resnet50perf the tests with an inbalanced class distribution performed worse than with balanced classes.
So it is clearly a bad idea to add more good shots to the support set.
@ -178,6 +183,11 @@ So it is clearly a bad idea to add more good shots to the support set.
== CAML
=== Approach
For the CAML implementation the pretrained model weights from the original paper were used.
The non-causal sequence model (transformer) is pretrained with every class having the same number of shots.
This brings the limitation that it can only process default few-shot learning tasks in the n-way k-shots fashion.
Since it expects the input sequence to be distributed with the same number of shots per class.
This is the reason why for this method the two imbalanced test cases couldn't be conducted.
As a feture extractor a ViT-B/16 model was used, which is a Vision Transformer with a patch size of 16.
This feature extractor was already pretrained when used by the authors of the original paper.
For the non-causal sequence model a transformer model was used

View File

@ -5,32 +5,32 @@
Anomaly detection has especially in the industrial and automotive field essential importance.
Lots of assembly lines need visual inspection to find errors often with the help of camera systems.
Machine learning helped the field to advance a lot in the past.
Most of the time the error rate is sub $.1%$ and therefore plenty of good data is available and the data is heavily unbalaned.
Most of the time the error rate is sub $.1%$ and therefore plenty of good data and almost no faulty data is available.
So the train data is heavily unbalanced.~#cite(<parnami2022learningexamplessummaryapproaches>)
PatchCore and EfficientAD are state of the art algorithms trained only on good data and then detect anomalies within unseen (but similar) data.
One of their problems is the need of lots of training data and time to train.
Moreover a slight change of the camera position or the lighting conditions can lead to a complete retraining of the model.
Few-Shot learning might be a suitable alternative with hugely lowered train times and fast adaption to new conditions.
Moreover a slight change of the camera position or the lighting conditions can lead to a mandatory complete retraining of the model.
Few-Shot learning might be a suitable alternative with hugely lowered train times and fast adaption to new conditions.~#cite(<efficientADpaper>)#cite(<patchcorepaper>)#cite(<parnami2022learningexamplessummaryapproaches>)
In this thesis the performance of 3 Few-Shot learning algorithms will be compared in the field of anomaly detection.
In this thesis the performance of 3 Few-Shot learning algorithms (ResNet50, P>M>F, CAML) will be compared in the field of anomaly detection.
Moreover, few-shot learning might be able not only to detect anomalies but also to detect the anomaly class.
== Research Questions <sectionresearchquestions>
=== Is Few-Shot learning a suitable fit for anomaly detection?
Should Few-Shot learning be used for anomaly detection tasks?
How does it compare to well established algorithms such as Patchcore or EfficientAD?
_Should Few-Shot learning be used for anomaly detection tasks?
How does it compare to well established algorithms such as Patchcore or EfficientAD?_
=== How does disbalancing the Shot number affect performance?
Does giving the Few-Shot learner more good than bad samples improve the model performance?
_Does giving the Few-Shot learner more good than bad samples improve the model performance?_
=== How does the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?
How much does the performance improve if only detecting an anomaly or not?
How does it compare to PatchCore and EfficientAD?
=== How do the 3 (ResNet, CAML, \pmf) methods perform in only detecting the anomaly class?
_How much does the performance improve by only detecting the presence of an anomaly?
How does it compare to PatchCore and EfficientAD?_
#if inwriting [
=== Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?
=== _Extra: How does Euclidean distance compare to Cosine-similarity when using ResNet as a feature-extractor?_
// I've tried different distance measures $->$ but results are pretty much the same.
]
@ -45,7 +45,7 @@ It outlines the experimental setup, including the use of Jupyter Notebook for pr
The experimental outcomes are presented in @sectionexperimentalresults.
This section addresses the research questions posed in @sectionresearchquestions, examining the suitability of Few-Shot Learning for anomaly detection tasks, the impact of class imbalance on model performance, and the comparative effectiveness of the three selected methods.
Additional experiments explore the differences between Euclidean distance and Cosine similarity when using ResNet as a feature extractor.#todo[Maybe remove this]
//Additional experiments explore the differences between Euclidean distance and Cosine similarity when using ResNet as a feature extractor.#todo[Maybe remove this]
Finally, @sectionconclusionandoutlook, summarizes the key findings of this study.
It reflects on the implications of the results for the field of anomaly detection and proposes directions for future research that could address the limitations and enhance the applicability of Few-Shot Learning approaches in this domain.

View File

@ -44,7 +44,7 @@
thesis-type: "Bachelor",
degree: "Bachelor of Science",
program: "Artifical Intelligence",
supervisor: "Professor Scharinger Josef",
supervisor: "Josef Scharinger, a.Univ.-Prof, Dr.",
advisors: (), // singular advisor like this: ("Dr. Felix Pawsworth",) and no supervisor: ""
department: "Institute of Computational Perception",
author: "Lukas Heiligenbrunner",
@ -52,7 +52,19 @@
place-of-submission: "Linz",
title: "Few shot learning for anomaly detection",
abstract-en: [//max. 250 words
#lorem(200) ],
This thesis explores the application of Few-Shot Learning (FSL) in anomaly detection, a critical area in industrial and automotive domains requiring robust and efficient algorithms for identifying defects.
Traditional methods, such as PatchCore and EfficientAD, achieve high accuracy but often demand extensive training data and are sensitive to environmental changes, necessitating frequent retraining.
FSL offers a promising alternative by enabling models to generalize effectively from minimal samples, thus reducing training time and adaptation overhead.
The study evaluates three FSL methods—ResNet50, P>M>F, and CAML—using the MVTec AD dataset.
Experiments focus on tasks such as anomaly detection, class imbalance handling, //and comparison of distance metrics.
and anomaly type classification.
Results indicate that while FSL methods trail behind state-of-the-art algorithms in detecting anomalies, they excel in classifying anomaly types, showcasing potential in scenarios requiring detailed defect identification.
Among the tested approaches, P>M>F emerged as the most robust, demonstrating superior accuracy across various settings.
This research underscores the limitations and niche applicability of FSL in anomaly detection, advocating its integration with established algorithms for enhanced performance.
Future work should address the scalability and domain-specific adaptability of FSL techniques to broaden their utility in industrial applications.
],
abstract-de: none,// or specify the abbstract_de in a container []
acknowledgements: none,//acknowledgements: none // if you are self-made
show-title-in-header: false,
@ -165,4 +177,4 @@
#include "conclusionandoutlook.typ"
#set par(leading: 0.7em, first-line-indent: 0em, justify: true)
#bibliography("sources.bib", style: "apa")
#bibliography("sources.bib", style: "ieee")

View File

@ -18,7 +18,7 @@ Each category comprises a set of defect-free training images and a test set of i
In this bachelor thesis only two categories are used. The categories are "Bottle" and "Cable".
The bottle category contains 3 different defect classes: 'broken_large', 'broken_small' and 'contamination'.
The bottle category contains 3 different defect classes: _broken_large_, _broken_small_ and _contamination_.
#subpar.grid(
figure(image("rsc/mvtec/bottle/broken_large_example.png"), caption: [
Broken large defect
@ -34,9 +34,9 @@ The bottle category contains 3 different defect classes: 'broken_large', 'broken
label: <full>,
)
Whereas cable has a lot more defect classes: 'bent_wire', 'cable_swap', 'combined', 'cut_inner_insulation',
'cut_outer_insulation', 'missing_cable', 'missing_wire', 'poke_insulation'.
So many more defect classes are already an indication that a classification task might be more difficult for the cable category.
Whereas cable has a lot more defect classes: _bent_wire_, _cable_swap_, _combined_, _cut_inner_insulation_,
_cut_outer_insulation_, _missing_cable_, _missing_wire_, _poke_insulation_.
More defect classes are already an indication that a classification task might be more difficult for the cable category.
#subpar.grid(
figure(image("rsc/mvtec/cable/bent_wire_example.png"), caption: [
@ -72,32 +72,33 @@ So many more defect classes are already an indication that a classification task
=== Few-Shot Learning
Few-Shot learning is a subfield of machine-learning which aims to train a classification-model with just a few or no samples at all.
In contrast to traditional supervised learning where a huge amount of labeled data is required is to generalize well to unseen data.
So the model is prone to overfitting to the few training samples.
In contrast to traditional supervised learning, where a huge amount of labeled data is required to generalize well to unseen data,
here we only have 1-10 samples per class (so called shots).
So the model is prone to overfitting to the few training samples and this means they should represent the whole sample distribution as good as possible.~#cite(<parnami2022learningexamplessummaryapproaches>)
Typically a few-shot leaning task consists of a support and query set.
Where the support-set contains the training data and the query set the evaluation data for real world evaluation.
A common way to format a few-shot leaning problem is using n-way k-shot notation.
For Example 3 target classeas and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.
For Example, 3 target classes and 5 samples per class for training might be a 3-way 5-shot few-shot classification problem.~@snell2017prototypicalnetworksfewshotlearning @patchcorepaper
A classical example of how such a model might work is a prototypical network.
These models learn a representation of each class and classify new examples based on proximity to these representations in an embedding space.
These models learn a representation of each class in a reduced dimensionality and classify new examples based on proximity to these representations in an embedding space.~@snell2017prototypicalnetworksfewshotlearning
#figure(
image("rsc/prototype_fewshot_v3.png", width: 60%),
caption: [Prototypical network for few-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
caption: [Prototypical network for 3-ways and 5-shots. #cite(<snell2017prototypicalnetworksfewshotlearning>)],
) <prototypefewshot>
The first and easiest method of this bachelor thesis uses a simple ResNet to calucalte those embeddings and is basically a simple prototypical netowrk.
See #todo[link to this section]
#todo[proper source]
The first and easiest method of this bachelor thesis uses a simple ResNet50 to calucalte those embeddings and clusters the shots together by calculating the class center.
This is basically a simple prototypical network.
See @resnet50impl.~@chowdhury2021fewshotimageclassificationjust
=== Generalisation from few samples
An especially hard task is to generalize from such few samples.
In typical supervised learning the model sees thousands or millions of samples of the corresponding domain during learning.
This helps the model to learn the underlying patterns and to generalize well to unseen data.
In few-shot learning the model has to generalize from just a few samples.
In few-shot learning the model has to generalize from just a few samples.#todo[Source?]#todo[Write more about. eg. class distributions]
=== Softmax
#todo[Maybe remove this section]
@ -126,10 +127,10 @@ $ <crel>
Equation~$cal(L)(p,q)$ @crelbatched #cite(<handsonaiI>) is the Binary Cross Entropy Loss for a batch of size $cal(B)$ and used for model training in this Practical Work.
=== Cosine Similarity
To measure the distance between two vectors some common distance measures are used.
One popular of them is the Cosine Similarity (@cosinesimilarity).
It measures the cosine of the angle between two vectors.
The Cosine Similarity is especially useful when the magnitude of the vectors is not important.
Cosine similarity is a widely used metric for measuring the similarity between two vectors. (@cosinesimilarity).
It computes the cosine of the angle between the vectors, offering a measure of their alignment.
This property makes the cosine similarity particularly effective in scenarios where the
direction of the vector holds more important information than the magnitude.
$
cos(theta) &:= (A dot B) / (||A|| dot ||B||)\
@ -213,6 +214,7 @@ The notebook along with the editor provides a environment for fast prototyping a
It is widely used in the data science, mathematics and machine learning community.~#cite(<jupyter>)
In the context of this bachelor thesis it was used to test and evaluate the three few-shot learning methods and to compare them.
Furthermore, Matplotlib was used to create the comparison plots.
=== CNN
Convolutional neural networks are especially good model architectures for processing images, speech and audio signals.
A CNN typically consists of Convolutional layers, pooling layers and fully connected layers.
@ -372,21 +374,78 @@ Its use of frozen pre-trained feature extractors is key to avoiding overfitting
== Alternative Methods
There are several alternative methods to few-shot learning which are not used in this bachelor thesis.
#todo[Do it!]
There are several alternative methods to few-shot learning as well as to anomaly detection which are not used in this bachelor thesis.
Either they performed worse on benchmarks compared to the used methods or they were released after my initial literature research.
=== SgVA-CLIP (Semantic-guided Visual Adapting CLIP)
// https://arxiv.org/pdf/2211.16191v2
// https://arxiv.org/abs/2211.16191v2
SgVA-CLIP (Semantic-guided Visual Adapting CLIP) is a framework that improves few-shot learning by adapting pre-trained vision-language models like CLIP.
It focuses on generating better visual features for specific tasks while still using the general knowledge from the pre-trained model.
Instead of only aligning images and text, SgVA-CLIP includes a special visual adapting layer that makes the visual features more discriminative for the given task.
This process is supported by knowledge distillation, where detailed information from the pre-trained model guides the learning of the new visual features.
Additionally, the model uses contrastive losses to further refine both the visual and textual representations.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
One advantage of SgVA-CLIP is that it can work well with very few labeled samples, making it suitable for applications like anomaly detection.
The use of pre-trained knowledge helps reduce the need for large datasets.
However, a disadvantage is that it depends heavily on the quality and capabilities of the pre-trained model.
If the pre-trained model lacks relevant information for the task, SgVA-CLIP might struggle to adapt.
This might be a no-go for anomaly detection tasks because the images in such tasks are often very task-specific and not covered by general pre-trained models.
Also, fine-tuning the model can require considerable computational resources, which might be a limitation in some cases.~#cite(<peng2023sgvaclipsemanticguidedvisualadapting>)
=== TRIDENT (Transductive Decoupled Variational Inference for Few-Shot Classification) <TRIDENT>
// https://arxiv.org/pdf/2208.10559v1
// https://arxiv.org/abs/2208.10559v1
TRIDENT, a variational infernce network, is a few-shot learning approach which decouples image representation into semantic and label-specific latent variables.
Semantic attributes contain context or stylistic information, while label-specific attributes focus on the characteristics crucial for classification.
By decoupling these parts TRIDENT enhances the networks ability to generalize effectively from unseen data.~#cite(<singh2022transductivedecoupledvariationalinference>)
To further improve the discriminative performance of the model, it incorporates a transductive feature extraction module named AttFEX (Attention-based Feature Extraction).
This feature extractor dynamically aligns features from both the support and the query set, promoting task-specific embeddings.~#cite(<singh2022transductivedecoupledvariationalinference>)
This model is specifically designed for few-shot classification tasks but might also work well for anomaly detection.
Its ability to isolate critical features while droping irellevant context aligns with requirements needed for anomaly detection.
=== SOT (Self-Optimal-Transport Feature Transform) <SOT>
// https://arxiv.org/pdf/2204.03065v1
// https://arxiv.org/abs/2204.03065v1
The Self-Optimal-Transport (SOT) Feature Transform is designed to enhance feature sets for tsks like matching, grouping or classification by re-embedding feature representations.
This transform processes features as a set instead of using them individually.
This creates context-aware representations.
SOT can catch direct as well as indirect similarities between features which makes it suitable for tasks like few-shot learning or clustering.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
SOT uses a transport plan matrix derived from optimal transport theory to redefine feature relations.
This includes calculating pairwaise similarities (e.g. cosine similarities) between features and solving a min-cost max-flow problem to find an optimal match between features.
This results in an doubly stochastic matrix where each row represents the re-embedding of the corresponding feature in context with others.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
The transform features parameterless-ness, which makes it easy to integrate into existing machine-learning pipelines.
It is differentiable which allows for end-to-end training. For example (re-)train the hosting network to adopt to SOT.
SOT is equivariant, which means that the transform is invariant to the order of the input features.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
The improvements of SOT over traditional feature transforms dpeend on the used backbone network and the task.
But in most cases it outperforms state-of-the-art methods and could be used as a drop-in replacement for existing feature transforms.~#cite(<shalam2022selfoptimaltransportfeaturetransform>)
// anomaly detect
=== GLASS (Global and Local Anomaly co-Synthesis Strategy)
// https://arxiv.org/pdf/2407.09359v1
// https://arxiv.org/abs/2407.09359v1
GLASS (Global and Local Anomaly co-Synthesis Strategy) is a anomaly detection method for industrial applications.
It is a unified network which uses two different strategies to detect anomalies which are then combined.
The first one is Global Anomaly Synthesis (GAS), it operates on the feature level.
It uses a gaussian noise, guided by gradient ascent and constrained by truncated projection to generate anomalies close to the distribution for the normal features.
This helps the detection of weak defects.
The second strategy is Local Anomaly Synthesis (LAS), it operates on the image level.
This strategy overlays textures onto normal images using masks derived from noise patterns.
LAS creates strong anomalies which are further away from the normal sample distribution.
This adds diversity to the synthesized anomalies.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
GLASS combines GAS and LAS to improve anomaly detection and localization by synthesizing anomalies near and far from the normal distribution.
Experiments show that GLASS is very effective and outperforms some state-of-the-art methods on the MVTec AD dataset such as PatchCore in some cases.~#cite(<chen2024unifiedanomalysynthesisstrategy>)
//=== HETMM (Hard-normal Example-aware Template Mutual Matching)
// https://arxiv.org/pdf/2303.16191v5
// https://arxiv.org/abs/2303.16191v5

View File

@ -137,3 +137,63 @@
primaryClass={cs.CV},
url={https://arxiv.org/abs/2204.07305},
}
@misc{peng2023sgvaclipsemanticguidedvisualadapting,
title={SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification},
author={Fang Peng and Xiaoshan Yang and Linhui Xiao and Yaowei Wang and Changsheng Xu},
year={2023},
eprint={2211.16191},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2211.16191},
}
@misc{singh2022transductivedecoupledvariationalinference,
title={Transductive Decoupled Variational Inference for Few-Shot Classification},
author={Anuj Singh and Hadi Jamali-Rad},
year={2022},
eprint={2208.10559},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2208.10559},
}
@misc{chen2024unifiedanomalysynthesisstrategy,
title={A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization},
author={Qiyu Chen and Huiyuan Luo and Chengkan Lv and Zhengtao Zhang},
year={2024},
eprint={2407.09359},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.09359},
}
@misc{shalam2022selfoptimaltransportfeaturetransform,
title={The Self-Optimal-Transport Feature Transform},
author={Daniel Shalam and Simon Korman},
year={2022},
eprint={2204.03065},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2204.03065},
}
@misc{parnami2022learningexamplessummaryapproaches,
title={Learning from Few Examples: A Summary of Approaches to Few-Shot Learning},
author={Archit Parnami and Minwoo Lee},
year={2022},
eprint={2203.04291},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2203.04291},
}
@misc{chowdhury2021fewshotimageclassificationjust,
title={Few-shot Image Classification: Just Use a Library of Pre-trained Feature Extractors and a Simple Classifier},
author={Arkabandhu Chowdhury and Mingchao Jiang and Swarat Chaudhuri and Chris Jermaine},
year={2021},
eprint={2101.00562},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2101.00562},
}