How to get sparse and cobert from Triton Endpoint?

#87
by snowneji - opened

We deployed M3 on our triton server and was able to consume it:

infer_inputs.append(httpclient.InferInput("input_ids", target_shape, "INT64"))
infer_inputs.append(httpclient.InferInput("attention_mask", target_shape, "INT64"))

infer_inputs[0].set_data_from_numpy(input_ids, binary_data=False)
infer_inputs[1].set_data_from_numpy(attention_mask, binary_data=True)

# Perform inference
response = triton_client.infer(model_name="bge-m3", inputs=infer_inputs)

But the output contains only sentence_embedding field, I assume this is the dense embedding vectors.
How do I get sparse and CoBERT vectors? Is there an argument I can pass in?

image.png

Sign up or log in to comment