Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ widget:
|
|
22 |
|
23 |
## Model details
|
24 |
|
25 |
-
The fundamental concept behind HelpingAI-Vision is to generate one token embedding per N parts of an image, as opposed to producing N visual token embeddings for the entire image. This approach, based on the
|
26 |
|
27 |
For every crop of the image, an embedding is generated using the full SigLIP encoder (size [1, 1152]). Subsequently, all N embeddings undergo processing through the LLaVA adapter, resulting in a token embedding of size [N, 2560]. Currently, these tokens lack explicit information about their position in the original image, with plans to incorporate positional information in a later update.
|
28 |
|
@@ -31,7 +31,7 @@ HelpingAI-Vision was fine-tuned from Dolphin 2.6 Phi, leveraging the vision towe
|
|
31 |
The model adopts the ChatML prompt format, suggesting its potential application in chat-based scenarios. If you have specific queries or would like further details, feel free
|
32 |
```
|
33 |
<|im_start|>system
|
34 |
-
You are
|
35 |
<|im_start|>user
|
36 |
{prompt}<|im_end|>
|
37 |
<|im_start|>assistant
|
|
|
22 |
|
23 |
## Model details
|
24 |
|
25 |
+
The fundamental concept behind HelpingAI-Vision is to generate one token embedding per N parts of an image, as opposed to producing N visual token embeddings for the entire image. This approach, based on the HelpingAI-Lite and incorporating the LLaVA adapter, aims to enhance scene understanding by capturing more detailed information.
|
26 |
|
27 |
For every crop of the image, an embedding is generated using the full SigLIP encoder (size [1, 1152]). Subsequently, all N embeddings undergo processing through the LLaVA adapter, resulting in a token embedding of size [N, 2560]. Currently, these tokens lack explicit information about their position in the original image, with plans to incorporate positional information in a later update.
|
28 |
|
|
|
31 |
The model adopts the ChatML prompt format, suggesting its potential application in chat-based scenarios. If you have specific queries or would like further details, feel free
|
32 |
```
|
33 |
<|im_start|>system
|
34 |
+
You are Vortex, a helpful AI assistant.<|im_end|>
|
35 |
<|im_start|>user
|
36 |
{prompt}<|im_end|>
|
37 |
<|im_start|>assistant
|