gokaygokay commited on
Commit
ea2a6a0
1 Parent(s): 6ca63c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -22
README.md CHANGED
@@ -3,15 +3,7 @@ license: apache-2.0
3
  pipeline_tag: image-text-to-text
4
  ---
5
 
6
- moondream2 is a small vision language model designed to run efficiently on edge devices. Check out the [GitHub repository](https://github.com/vikhyat/moondream) for details, or try it out on the [Hugging Face Space](https://huggingface.co/spaces/vikhyatk/moondream2)!
7
-
8
- **Benchmarks**
9
-
10
- | Release | VQAv2 | GQA | TextVQA | TallyQA (simple) | TallyQA (full) |
11
- | --- | --- | --- | --- | --- | --- |
12
- | 2024-03-04 | 74.2 | 58.5 | 36.4 | - | - |
13
- | 2024-03-06 | 75.4 | 59.8 | 43.1 | 79.5 | 73.2 |
14
- | **2024-03-13** (latest) | 76.8 | 60.6 | 46.4 | 79.6 | 73.3 |
15
 
16
  **Usage**
17
 
@@ -20,20 +12,25 @@ pip install transformers timm einops
20
  ```
21
 
22
  ```python
23
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
24
  from PIL import Image
25
 
26
- model_id = "vikhyatk/moondream2"
27
- revision = "2024-03-06"
28
- model = AutoModelForCausalLM.from_pretrained(
29
- model_id, trust_remote_code=True, revision=revision
30
- )
31
- tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)
32
 
33
- image = Image.open('<IMAGE_PATH>')
34
- enc_image = model.encode_image(image)
35
- print(model.answer_question(enc_image, "Describe this image.", tokenizer))
36
- ```
37
 
38
- The model is updated regularly, so we recommend pinning the model version to a
39
- specific release as shown above.
 
 
 
 
 
 
 
 
 
3
  pipeline_tag: image-text-to-text
4
  ---
5
 
6
+ Fine tuned version of moondream2 for prompt generation from images. Moondream is a small vision language model designed to run efficiently on edge devices. Check out the [GitHub repository](https://github.com/vikhyat/moondream) for details, or try it out on the [Hugging Face Space](https://huggingface.co/spaces/vikhyatk/moondream2)!
 
 
 
 
 
 
 
 
7
 
8
  **Usage**
9
 
 
12
  ```
13
 
14
  ```python
15
+ import torch
16
+ from transformers import AutoTokenizer, AutoModelForCausalLM
17
  from PIL import Image
18
 
19
+ DEVICE = "cuda"
20
+ DTYPE = torch.float32 if DEVICE == "cpu" else torch.float16 # CPU doesn't support float16
 
 
 
 
21
 
22
+ tokenizer = AutoTokenizer.from_pretrained("gokaygokay/moondream-prompt")
23
+ moondream = AutoModelForCausalLM.from_pretrained("gokaygokay/moondream-prompt",trust_remote_code=True,
24
+ torch_dtype=DTYPE, device_map={"": DEVICE})
25
+ moondream.eval()
26
 
27
+ image_path = "<image_path>"
28
+ image = Image.open(image_path).convert("RGB")
29
+ md_answer = moondream.answer_question(
30
+ moondream.encode_image(image),
31
+ "Describe this image and its style in a very detailed manner",
32
+ tokenizer=tokenizer,
33
+ )
34
+
35
+ print(md_answer)
36
+ ```