|
--- |
|
license: mit |
|
language: |
|
- en |
|
tags: |
|
- medical |
|
- finance |
|
- chemistry |
|
- biology |
|
--- |
|
![BGE-reranking](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*tCBbIjV_jLZP1AKLTX7rAw.png) |
|
|
|
# BGE-Renranker-Large |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This is an `int8` converted version of [bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large). Thanks to `c2translate` this should |
|
be at least 3 times faster than the original hf transformer version while its smaller with minimal performance loss. |
|
|
|
|
|
|
|
## Model Details |
|
Different from embedding model `bge-large-en-v1.5`, reranker uses question and document as input and directly output similarity instead of embedding. |
|
You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range. |
|
Besides this is highly optimized version using `c2translate` library suitable for production environments. |
|
|
|
### Model Sources |
|
|
|
The original model is based on `BAAI` `BGE-Reranker` model. Please visit [bge-reranker-orignal-repo](https://huggingface.co/BAAI/bge-reranker-large) |
|
for more details. |
|
|
|
## Usage |
|
|
|
Simply `pip install ctranslate2` and then |
|
|
|
```python |
|
import ctranslate2 |
|
import transformers |
|
import torch |
|
|
|
device_mapping="cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
model_dir = "hooman650/ct2fast-bge-reranker" |
|
|
|
# ctranslate2 encoder heavy lifting |
|
encoder = ctranslate2.Encoder(model_dir, device = device_mapping) |
|
|
|
# the classification head comes from HF |
|
model_name = "BAAI/bge-reranker-large" |
|
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name) |
|
classifier = transformers.AutoModelForSequenceClassification.from_pretrained(model_name).classifier |
|
|
|
classifier.eval() |
|
classifier.to(device_mapping) |
|
|
|
pairs = [ |
|
["I like Ctranslate2","Ctranslate2 makes mid range models faster"], |
|
["I like Ctranslate2","Using naive transformers might not be suitable for deployment"] |
|
] |
|
with torch.no_grad(): |
|
tokens = tokenizer(pairs, padding=True, truncation=True, max_length=512).input_ids |
|
output = encoder.forward_batch(tokens) |
|
hidden_state = torch.as_tensor(output.last_hidden_state, device=device_mapping) |
|
logits = classifier(hidden_state).squeeze() |
|
|
|
print(logits) |
|
|
|
# tensor([ 1.0474, -9.4694], device='cuda:0') |
|
``` |
|
|
|
|
|
#### Hardware |
|
|
|
Supports both GPU and CPU. |
|
|
|
|