openflamingo
/

OpenFlamingo-4B-vitl-rpj3b

Model card Files Files and versions Community

Irena Gao commited on Jun 15, 2023

Commit

bca9a71

•

1 Parent(s): fae8652

update README

Files changed (1) hide show

README.md +22 -0

README.md CHANGED Viewed

@@ -18,6 +18,28 @@ This model has cross-attention modules inserted in *every other* decoder block.
 ## Uses
 OpenFlamingo models process arbitrarily interleaved sequences of images and text to output text. This allows the models to accept in-context examples and undertake tasks like captioning, visual question answering, and image classification.
 ### Generation example
 Below is an example of generating text conditioned on interleaved images/text. In particular, let's try few-shot image captioning.

 ## Uses
 OpenFlamingo models process arbitrarily interleaved sequences of images and text to output text. This allows the models to accept in-context examples and undertake tasks like captioning, visual question answering, and image classification.
+### Initialization
+``` python
+from open_flamingo import create_model_and_transforms
+model, image_processor, tokenizer = create_model_and_transforms(
+    clip_vision_encoder_path="ViT-L-14",
+    clip_vision_encoder_pretrained="openai",
+    lang_encoder_path="togethercomputer/RedPajama-INCITE-Base-3B-v1",
+    tokenizer_path="togethercomputer/RedPajama-INCITE-Base-3B-v1",
+    cross_attn_every_n_layers=2
+)
+# grab model checkpoint from huggingface hub
+from huggingface_hub import hf_hub_download
+import torch
+checkpoint_path = hf_hub_download("openflamingo/OpenFlamingo-4B-vitl-rpj3b", "checkpoint.pt")
+model.load_state_dict(torch.load(checkpoint_path), strict=False)
+```
 ### Generation example
 Below is an example of generating text conditioned on interleaved images/text. In particular, let's try few-shot image captioning.