How to get sparse and cobert from Triton Endpoint?
#87
by
snowneji
- opened
We deployed M3 on our triton server and was able to consume it:
infer_inputs.append(httpclient.InferInput("input_ids", target_shape, "INT64"))
infer_inputs.append(httpclient.InferInput("attention_mask", target_shape, "INT64"))
infer_inputs[0].set_data_from_numpy(input_ids, binary_data=False)
infer_inputs[1].set_data_from_numpy(attention_mask, binary_data=True)
# Perform inference
response = triton_client.infer(model_name="bge-m3", inputs=infer_inputs)
But the output contains only sentence_embedding
field, I assume this is the dense embedding vectors.
How do I get sparse and CoBERT vectors? Is there an argument I can pass in?