jinaai
/

xlm-roberta-flash-implementation-onnx

Inference Endpoints

🇪🇺 Region: EU

Model card Files Files and versions Community

jupyterjazz commited on Sep 16

Commit

712be5f

•

1 Parent(s): 4bfe854

Update README.md

Files changed (1) hide show

README.md +22 -1

README.md CHANGED Viewed

@@ -101,4 +101,25 @@ language:
   - zh
 ---
-Modified version of https://huggingface.co/jinaai/xlm-roberta-flash-implementation for the onnx conversion

   - zh
 ---
+Modified version of [xlm-roberta-flash-implementation](https://huggingface.co/jinaai/xlm-roberta-flash-implementation) for the onnx conversion
+## Brief Summary of Challenges and Modifications:
+### Dynamic Matrix Calculation in RoPE
+The original RoPE implementation did not compute the entire rotation matrix at the start. Instead, it calculated the matrix only for the required sequence length, cached it, and recalculated if a longer sequence came as input. This approach isn't compatible with ONNX, which requires a fixed graph during inference. To solve this, I now calculate the entire rotation matrix in advance.
+### Custom Backward Functions for RoPE
+We have custom forward and backward functions for RoPE. ONNX does not support custom backward functions, but since we only need forward passes for inference with ONNX, I removed the backward function completely.
+### ONNX Model Size Limitation
+ONNX stores the model in a protobuf format, which has a maximum size limit of 2GB. Our model was too large to fit this limit, so I had to store the model's parameters as external data files.
+### Lack of Support for the `unique()` Function
+We used the `unique()` function to identify unique task types in a batch, which is important when there are multiple task types. However, ONNX does not support the unique() function. For inference, having multiple task types in a batch is not important. Therefore, I modified the code to use the `task_id` argument—an integer that works for every text in a batch—instead of the `adapter_mask`, which was a tensor specifying an independent task ID for each text in the batch.