What is the inference time? Any ideas how to make it faster?

#52
by leoapolonio - opened

I have it deployed using g5.48xlarge (which uses A10s under the hood). And I see more than 60s to generate 500 tokens.

Any suggested paths to make it faster?

Technology Innovation Institute org

We would recommend using Text Generation Inference for optimal performance. On AWS SageMaker, have a look also at this blog.

FalconLLM changed discussion status to closed

Sign up or log in to comment