Update ONNX weights
#24
by
Xenova
HF staff
- opened
This PR:
- Uses external data format to save model (so we don't need the individual tensors)
- Adds fp16 version
- Slims model.onnx using onnxslim for a more optimized graph
+--------------------+------------------------------------------+------------------------------------------+
| Model Name | model.onnx | Op Set: 16 |
+--------------------+------------------------------------------+------------------------------------------+
| Model Info | Original Model | Slimmed Model |
+--------------------+------------------------------------------+------------------------------------------+
| IN: input_ids | int64: ('batch_size', 'sequence_length') | int64: ('batch_size', 'sequence_length') |
| IN: attention_mask | int64: ('batch_size', 'sequence_length') | int64: ('batch_size', 'sequence_length') |
| IN: task_id | int64: None | int64: None |
| OUT: text_embeds | float32: ('batch_size', | float32: ('batch_size', |
| | 'Addtext_embeds_dim_1', 1024) | 'sequence_length', 1024) |
| OUT: 13049 | float32: ('batch_size', 1024) | float32: ('batch_size', 1024) |
+--------------------+------------------------------------------+------------------------------------------+
| Add | 486 | 438 |
| Cast | 529 | 1 |
| Concat | 481 | 216 |
| Constant | 4047 | 0 |
| ConstantOfShape | 121 | 25 |
| Div | 337 | 121 |
| Einsum | 48 | 48 |
| Equal | 96 | 0 |
| Erf | 24 | 24 |
| Expand | 96 | 96 |
| Gather | 826 | 514 |
| Gemm | 1 | 1 |
| MatMul | 195 | 195 |
| Mul | 748 | 316 |
| Neg | 48 | 48 |
| Pow | 49 | 49 |
| ReduceMean | 98 | 98 |
| Reshape | 435 | 363 |
| Shape | 553 | 145 |
| Slice | 288 | 288 |
| Softmax | 24 | 24 |
| Split | 24 | 24 |
| Sqrt | 49 | 49 |
| Squeeze | 72 | 72 |
| Sub | 49 | 49 |
| Tanh | 1 | 1 |
| Transpose | 96 | 96 |
| Unsqueeze | 1057 | 409 |
| Where | 120 | 24 |
+--------------------+------------------------------------------+------------------------------------------+
| Model Size | 2.14 GB | 1.44 MB (2.14 GB) |
+--------------------+------------------------------------------+------------------------------------------+
| Elapsed Time | 33.37 s |
+--------------------+------------------------------------------+------------------------------------------+
Hi @Xenova , thanks for your contribution!
Does the usage of the ONNX model change with this new format? We have an example in the README, so please update it if necessary. Also, how did you combine the external data into a single file? Could you please share the conversion code? I'd like to apply the same process to https://huggingface.co/jinaai/jina-colbert-v2
bwang0911
changed pull request status to
merged