Model slow to generate tokens

by DefamationStation - opened Dec 7, 2023

Discussion

DefamationStation

Dec 7, 2023

Hello,

4090 i9 13900HX and using LM Studio Beta v6

Models speed at n_gpu_layers -1, 10, 20, 40 and the models output generation speed is the same, not sure if it's my machine or the models performance but usually I get entire walls of text instantly from models like this

tachyphylaxis

Dec 8, 2023

It might be that you're using the fast tokenizer for ordinary llama models. (In text-generation-webui, that's the use_fast option). This has its own tokenizer, so you have to use that.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment