jupyterjazz commited on
Commit
712be5f
1 Parent(s): 4bfe854

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -1
README.md CHANGED
@@ -101,4 +101,25 @@ language:
101
  - zh
102
  ---
103
 
104
- Modified version of https://huggingface.co/jinaai/xlm-roberta-flash-implementation for the onnx conversion
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  - zh
102
  ---
103
 
104
+ Modified version of [xlm-roberta-flash-implementation](https://huggingface.co/jinaai/xlm-roberta-flash-implementation) for the onnx conversion
105
+
106
+ ## Brief Summary of Challenges and Modifications:
107
+ ### Dynamic Matrix Calculation in RoPE
108
+ The original RoPE implementation did not compute the entire rotation matrix at the start. Instead, it calculated the matrix only for the required sequence length, cached it, and recalculated if a longer sequence came as input. This approach isn't compatible with ONNX, which requires a fixed graph during inference. To solve this, I now calculate the entire rotation matrix in advance.
109
+
110
+ ### Custom Backward Functions for RoPE
111
+ We have custom forward and backward functions for RoPE. ONNX does not support custom backward functions, but since we only need forward passes for inference with ONNX, I removed the backward function completely.
112
+
113
+ ### ONNX Model Size Limitation
114
+ ONNX stores the model in a protobuf format, which has a maximum size limit of 2GB. Our model was too large to fit this limit, so I had to store the model's parameters as external data files.
115
+
116
+ ### Lack of Support for the `unique()` Function
117
+ We used the `unique()` function to identify unique task types in a batch, which is important when there are multiple task types. However, ONNX does not support the unique() function. For inference, having multiple task types in a batch is not important. Therefore, I modified the code to use the `task_id` argument—an integer that works for every text in a batch—instead of the `adapter_mask`, which was a tensor specifying an independent task ID for each text in the batch.
118
+
119
+
120
+
121
+
122
+
123
+
124
+
125
+