nota-ai
/

phiva-4b-hf

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

phiva-4b-hf / README.md

jykim310's picture

add terms of use

d96a967 verified about 2 months ago

|

history blame contribute delete

1.93 kB

	---
	language:
	- en
	datasets:
	- liuhaotian/LLaVA-Instruct-150K
	- liuhaotian/LLaVA-Pretrain
	---

	## Usage
	```python
	import requests
	from PIL import Image

	import torch
	from transformers import AutoProcessor, LlavaForConditionalGeneration

	model_id = "nota-ai/phiva-4b-hf"

	prompt = "USER: <image>\nWhat are these?\nASSISTANT:"
	image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"

	model = LlavaForConditionalGeneration.from_pretrained(
	model_id,
	torch_dtype=torch.float16,
	low_cpu_mem_usage=True,
	attn_implementation="eager"
	).to(0)

	processor = AutoProcessor.from_pretrained(model_id)


	raw_image = Image.open(requests.get(image_file, stream=True).raw)
	inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)

	output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
	print(processor.decode(output[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True))
	```

	## Terms of use
	The vision-language model published in this repository was developed by combining several modules (e.g., vision encoder, language model). Commercial use of any modifications, additions, or newly trained parameters made to combine these modules is not allowed.
	However, commercial use of the unmodified modules is allowed under their respective licenses. If you wish to use the individual modules commercially, you may refer to their original repositories and licenses provided below.


	Vision encoder (license) link : [Model](https://huggingface.co/openai/clip-vit-base-patch16), [License](https://github.com/openai/CLIP/blob/main/LICENSE)

	Language model (license) link : [Model](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), [License](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/resolve/main/LICENSE)

	VLM framework (license) link: [Github](https://github.com/haotian-liu/LLaVA), [License](https://github.com/haotian-liu/LLaVA/blob/main/LICENSE)