File size: 1,116 Bytes
acc9003 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
# Multilingual Colbert embeddings as a service
## Goal
- Deploy [Antoine Louis](https://huggingface.co/antoinelouis)' [colbert-xm](https://huggingface.co/antoinelouis/colbert-xm) as an inference service: text(s) in, vector(s) out
## Motivation
- use the service in a broader RAG solution
## Steps followed
- Clone the original repo following [this procedure](https://huggingface.co/docs/hub/repositories-next-steps#how-to-duplicate-or-fork-a-repo-including-lfs-pointers)
- Add a custom handler script as described [here](https://huggingface.co/docs/inference-endpoints/guides/custom_handler)
## Local development and testing
### Build and start docker container hf_endpoints_emulator
See [hf_endpoints_emulator](https://pypi.org/project/hf-endpoints-emulator/)
````bash
docker-compose up -d --build
````
This can take a few moments to load, given the size of the model (> 3 GB)!
## How to test locally
```bash
./embed_single_query.sh
./embed_two_chunks.sh
```
```bash
docker-compose exec hf_endpoints_emulator pytest
```
## Check output
```bash
docker-compose logs --follow hf_endpoints_emulator
```
|