hooman650
/

ct2fast-bge-reranker

Inference Endpoints

Model card Files Files and versions Community

ct2fast-bge-reranker / README.md

hooman650's picture

Update README.md

2c168fe 12 months ago

|

2.38 kB

	---
	license: mit
	language:
	- en
	tags:
	- medical
	- finance
	- chemistry
	- biology
	---
	![BGE-reranking](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*tCBbIjV_jLZP1AKLTX7rAw.png)

	# BGE-Renranker-Large

	<!-- Provide a quick summary of what the model is/does. -->

	This is an `int8` converted version of [bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large). Thanks to `c2translate` this should
	be at least 3 times faster than the original hf transformer version while its smaller with minimal performance loss.



	## Model Details
	Different from embedding model `bge-large-en-v1.5`, reranker uses question and document as input and directly output similarity instead of embedding.
	You can get a relevance score by inputting query and passage to the reranker. The reranker is optimized based cross-entropy loss, so the relevance score is not bounded to a specific range.
	Besides this is highly optimized version using `c2translate` library suitable for production environments.

	### Model Sources

	The original model is based on `BAAI` `BGE-Reranker` model. Please visit [bge-reranker-orignal-repo](https://huggingface.co/BAAI/bge-reranker-large)
	for more details.

	## Usage

	Simply `pip install ctranslate2` and then

	```python
	import ctranslate2
	import transformers
	import torch

	device_mapping="cuda" if torch.cuda.is_available() else "cpu"

	model_dir = "hooman650/ct2fast-bge-reranker"

	# ctranslate2 encoder heavy lifting
	encoder = ctranslate2.Encoder(model_dir, device = device_mapping)

	# the classification head comes from HF
	model_name = "BAAI/bge-reranker-large"
	tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
	classifier = transformers.AutoModelForSequenceClassification.from_pretrained(model_name).classifier

	classifier.eval()
	classifier.to(device_mapping)

	pairs = [
	["I like Ctranslate2","Ctranslate2 makes mid range models faster"],
	["I like Ctranslate2","Using naive transformers might not be suitable for deployment"]
	]
	with torch.no_grad():
	tokens = tokenizer(pairs, padding=True, truncation=True, max_length=512).input_ids
	output = encoder.forward_batch(tokens)
	hidden_state = torch.as_tensor(output.last_hidden_state, device=device_mapping)
	logits = classifier(hidden_state).squeeze()

	print(logits)

	# tensor([ 1.0474, -9.4694], device='cuda:0')
	```


	#### Hardware

	Supports both GPU and CPU.