kind of finish caml general infos
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 8s
All checks were successful
Build Typst document / build_typst_documents (push) Successful in 8s
This commit is contained in:
parent
9cd678aa70
commit
ac4f4d78cb
@ -202,21 +202,26 @@ Todo
|
||||
// https://arxiv.org/pdf/2310.10971v2
|
||||
CAML (Context aware meta learning) is one of the state-of-the-art methods for few-shot learning.
|
||||
It consists of three different components: a frozen pre-trained image encoder, a fixed Equal Length and Maximally Equiangular Set (ELMES) class encoder and a non-causal sequence model.
|
||||
This is a universal meta-learning approach.
|
||||
That means no fine-tuning or meta-training is applied for specific domains.~#cite(<caml_paper>)
|
||||
|
||||
*Architecture:* CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture.
|
||||
*Architecture:*
|
||||
CAML first encodes the query and support set images using the fozen pre-trained feature extractor as shown in @camlarchitecture.
|
||||
This step brings the images into a low dimensional space where similar images are encoded into similar embeddings.
|
||||
The class labels are encoded with the ELMES class encoder.
|
||||
Since the class of the query image is unknown in this stage a special learnable "unknown token" is added to the encoder.
|
||||
This embedding is learned during pre-training.
|
||||
Afterwards each image embedding is concatenated with the corresponding class embedding.
|
||||
~#cite(<caml_paper>)
|
||||
#todo[Add more references to the architecture image below]
|
||||
|
||||
#todo[We should add stuff here why we have a max amount of shots bc. of pretrained model]
|
||||
|
||||
*ELMES Encoder:* The ELMES (Equal Length and Maximally Equiangular Set) encoder encodes the class labels to vectors of equal length.
|
||||
*ELMES Encoder:*
|
||||
The ELMES (Equal Length and Maximally Equiangular Set) encoder encodes the class labels to vectors of equal length.
|
||||
The encoder is a bijective mapping between the labels and set of vectors that are equal length and maximally equiangular.
|
||||
#todo[Describe what equiangular and bijective means]
|
||||
Similar to one-hot encoding but with some advantages.
|
||||
This encoder maximizes the algorithms ability to distinguish between different classes.
|
||||
~#cite(<caml_paper>)
|
||||
|
||||
*Non-causal sequence model:*
|
||||
The sequence created by the ELMES encoder is then fed into a non-causal sequence model.
|
||||
@ -226,14 +231,37 @@ Visual features from query and support set can be compared to each other to dete
|
||||
This can then be used to predict the class of the query image.
|
||||
From the output of the sequence model the element at the same position as the query is selected.
|
||||
Afterwards it is passed through a simple MLP network to predict the class of the query image.
|
||||
~#cite(<caml_paper>)
|
||||
|
||||
*Large-Scale Pre-Training:*
|
||||
#todo[Desc. what this is]
|
||||
CAML is pre-trained on a huge number of images from ImageNet-1k, Fungi, MSCOCO, and WikiArt datasets.
|
||||
Those datasets span over different domains and help to detect any new visual concept during inference.
|
||||
Only the non-causal sequence model is trained and the image encoder and ELMES encoder are frozen.
|
||||
~#cite(<caml_paper>)
|
||||
|
||||
*Theoretical Analysis:*
|
||||
#todo[Mybe not that important?]
|
||||
|
||||
*Inference:*
|
||||
During inference, CAML processes the following:
|
||||
- Encodes the support set images and labels with the pre-trained feature and class encoders.
|
||||
- Concatenates these encodings into a sequence alongside the query image embedding.
|
||||
- Passes the sequence through the non-causal sequence model, enabling dynamic interaction between query and support set representations.
|
||||
- Extracts the transformed query embedding and classifies it using a Multi-Layer Perceptron (MLP).~#cite(<caml_paper>)
|
||||
|
||||
*Results:*
|
||||
CAML achieves state-of-the-art performance in universal meta-learning across 11 few-shot classification benchmarks,
|
||||
including generic object recognition (e.g., MiniImageNet), fine-grained classification (e.g., CUB, Aircraft),
|
||||
and cross-domain tasks (e.g., Pascal+Paintings).
|
||||
It outperformed or matched existing models in 14 of 22 evaluation settings.
|
||||
It performes competitively against P>M>F in 8 benchmarks even though P>M>F was meta-trained on the same domain.
|
||||
~#cite(<caml_paper>)
|
||||
|
||||
CAML does great in generalization and inference efficiency but faces limitations in specialized domains (e.g., ChestX)
|
||||
and low-resolution tasks (e.g., CIFAR-fs).
|
||||
Its use of frozen pre-trained feature extractors is key to avoiding overfitting and enabling robust performance.
|
||||
~#cite(<caml_paper>)
|
||||
#todo[We should add stuff here why we have a max amount of shots bc. of pretrained model]
|
||||
|
||||
#figure(
|
||||
image("rsc/caml_architecture.png", width: 100%),
|
||||
|
Loading…
x
Reference in New Issue
Block a user