Update README.md
Browse files
README.md
CHANGED
@@ -23,11 +23,11 @@ To decode, the authors employ so-called decoder queries, which allow to flexibly
|
|
23 |
|
24 |
<small> Perceiver IO architecture.</small>
|
25 |
|
26 |
-
As the time and memory requirements of the self-attention mechanism don't depend on the size of the inputs, the Perceiver IO authors can train the model by padding the inputs with modality-specific embeddings and serialize all of them into a 2D input array (i.e. concatenate along the time dimension). Decoding the final hidden states of the latents is done by using queries containing Fourier-based position embeddings (for video and audio) and modality embeddings.
|
27 |
|
28 |
## Intended uses & limitations
|
29 |
|
30 |
-
You can use the raw model for multimodal autoencoding. Note that by masking the
|
31 |
|
32 |
See the [model hub](https://huggingface.co/models search=deepmind/perceiver) to look for other versions on a task that may interest you.
|
33 |
|
|
|
23 |
|
24 |
<small> Perceiver IO architecture.</small>
|
25 |
|
26 |
+
As the time and memory requirements of the self-attention mechanism don't depend on the size of the inputs, the Perceiver IO authors can train the model by padding the inputs (images, audio, class label) with modality-specific embeddings and serialize all of them into a 2D input array (i.e. concatenate along the time dimension). Decoding the final hidden states of the latents is done by using queries containing Fourier-based position embeddings (for video and audio) and modality embeddings.
|
27 |
|
28 |
## Intended uses & limitations
|
29 |
|
30 |
+
You can use the raw model for multimodal autoencoding. Note that by masking the class label during evaluation, the auto-encoding model becomes a video classifier.
|
31 |
|
32 |
See the [model hub](https://huggingface.co/models search=deepmind/perceiver) to look for other versions on a task that may interest you.
|
33 |
|