File size: 1,566 Bytes
05ea7ec
 
6525661
05ea7ec
6525661
ea2a6a0
6525661
 
 
 
57d539d
6525661
 
 
ea2a6a0
 
6525661
 
ea2a6a0
 
7f8733b
f741f12
ea2a6a0
f741f12
ea2a6a0
6525661
ea2a6a0
 
 
 
 
 
 
 
 
 
634b17a
 
 
 
c443fbc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: apache-2.0
pipeline_tag: image-text-to-text
---

Fine tuned version of moondream2 for prompt generation from images. Moondream is a small vision language model designed to run efficiently on edge devices. Check out the [GitHub repository](https://github.com/vikhyat/moondream) for details, or try it out on the [Hugging Face Space](https://huggingface.co/spaces/vikhyatk/moondream2)!

**Usage**

```bash
pip install transformers timm einops bitsandbytes accelerate
```

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from PIL import Image

DEVICE = "cuda"
DTYPE = torch.float32 if DEVICE == "cpu" else torch.float16 # CPU doesn't support float16
revision = "ac6c8fc0ba757c6c4d7d541fdd0e63618457350c"
tokenizer = AutoTokenizer.from_pretrained("gokaygokay/moondream-prompt", revision=revision)
moondream = AutoModelForCausalLM.from_pretrained("gokaygokay/moondream-prompt",trust_remote_code=True,
    torch_dtype=DTYPE, device_map={"": DEVICE}, revision=revision)
moondream.eval()

image_path = "<image_path>"
image = Image.open(image_path).convert("RGB")
md_answer = moondream.answer_question(
        moondream.encode_image(image),
        "Describe this image and its style in a very detailed manner",
        tokenizer=tokenizer,
    )

print(md_answer)
```

**Example**
![image/png](https://cdn-uploads.huggingface.co/production/uploads/630899601dd1e3075d975785/-x5jO3xnQrUz1uYO9SHji.png)

"a very angry old man with white hair and a mustache, in the style of a Pixar movie, hyperrealistic, white background, 8k"