Model slow to generate tokens
#5
by
DefamationStation
- opened
Hello,
4090 i9 13900HX and using LM Studio Beta v6
Models speed at n_gpu_layers -1, 10, 20, 40 and the models output generation speed is the same, not sure if it's my machine or the models performance but usually I get entire walls of text instantly from models like this
It might be that you're using the fast tokenizer for ordinary llama models. (In text-generation-webui, that's the use_fast option). This has its own tokenizer, so you have to use that.