feat: Added judgment logic to support training with plain text data.
Browse filesThe current logic assumes that all input data includes image inputs, so `data['pixel_values']` must match the training samples; however, if dealing with purely text data inputs, 'pixel_values' does not exist.
Although the backend code can handle such cases without image content, this will lead to errors before execution.
```
start = 0
for pixel_values in pixel_values_list:
img_cnt = len(pixel_values)
if img_cnt > 0:
vision_hidden_states.append(vision_embedding[start: start + img_cnt])
start += img_cnt
else:
vision_hidden_states.append([])
```
- modeling_minicpmv.py +2 -0
modeling_minicpmv.py
CHANGED
@@ -77,6 +77,8 @@ class MiniCPMV(MiniCPMVPreTrainedModel):
|
|
77 |
all_pixel_values = []
|
78 |
img_cnt = []
|
79 |
for pixel_values in pixel_values_list:
|
|
|
|
|
80 |
img_cnt.append(len(pixel_values))
|
81 |
all_pixel_values.extend([i.flatten(end_dim=1).permute(1, 0) for i in pixel_values])
|
82 |
|
|
|
77 |
all_pixel_values = []
|
78 |
img_cnt = []
|
79 |
for pixel_values in pixel_values_list:
|
80 |
+
if len(pixel_values) == 0:
|
81 |
+
continue
|
82 |
img_cnt.append(len(pixel_values))
|
83 |
all_pixel_values.extend([i.flatten(end_dim=1).permute(1, 0) for i in pixel_values])
|
84 |
|