Comment on image splitting
#4
by
HugoLaurencon
- opened
README.md
CHANGED
@@ -223,6 +223,8 @@ Given the high resolution supported, the vision part of the model can be memory
|
|
223 |
- **deactivate the image splitting.** To do so, add `do_image_splitting=False` when initializing the processor (`AutoProcessor.from_pretrained`). There are no changes required on the model side. Note that only the sft model has been trained with image splitting.
|
224 |
- **decrease the maximum image resolution.** To do so, add `size= {"longest_edge": 448, "shortest_edge": 378}` when initializing the processor (`AutoProcessor.from_pretrained`). In particular, the `longest_edge` value can be adapted to fit the need. We recommend using values that are multiples of 14. There are no changes required on the model side.
|
225 |
|
|
|
|
|
226 |
**Using Flash-attention 2 to speed up generation**
|
227 |
|
228 |
<details><summary>Click to expand.</summary>
|
|
|
223 |
- **deactivate the image splitting.** To do so, add `do_image_splitting=False` when initializing the processor (`AutoProcessor.from_pretrained`). There are no changes required on the model side. Note that only the sft model has been trained with image splitting.
|
224 |
- **decrease the maximum image resolution.** To do so, add `size= {"longest_edge": 448, "shortest_edge": 378}` when initializing the processor (`AutoProcessor.from_pretrained`). In particular, the `longest_edge` value can be adapted to fit the need. We recommend using values that are multiples of 14. There are no changes required on the model side.
|
225 |
|
226 |
+
`do_image_splitting=True` is especially needed to boost performance on OCR tasks where a very large image is used as input. For the regular VQA or captioning tasks, this argument can be safely set to `False` with minimal impact on performance (see the evaluation table above).
|
227 |
+
|
228 |
**Using Flash-attention 2 to speed up generation**
|
229 |
|
230 |
<details><summary>Click to expand.</summary>
|