Context Length 32k tokens ?

by fuckfuckyou11 - opened Oct 4

Oct 4

Why this GGUF has context length in description 32k? Here https://huggingface.co/Qwen/Qwen2.5-72B-Instruct it states 131k context length. What happened?

jklj077

Qwen org Oct 8

Have llama.cpp supported YaRN yet? If it has, enabling YaRN as with the original model in its modelcard should extend the context length.

fuckfuckyou11

Oct 11

Have llama.cpp supported YaRN yet? If it has, enabling YaRN as with the original model in its modelcard should extend the context length.

Should it? I have never heard about Yarn, I tried to find issues in llama.cpp github repo, still nothing , neither opened or closed issue. If it supports,so my original question, why 32k context length in description still?

jklj077

Qwen org Oct 11

128K context length needs YaRN (that's what we have tested). no YaRN no 128K.

If you use other methods to extend the context length, they may work also. But we don't really know.

BingoBird

Oct 16

•

edited Oct 16

llama.cpp got yarn support of some kind merged before Nov 4, 2023 https://github.com/ggerganov/llama.cpp/discussions/2963#discussioncomment-7475016

I suggest directing queries to the github.com discussions or issues pages.

I also find some discussion here: https://github.com/ggerganov/llama.cpp/discussions/7416

fuckfuckyou11

Oct 17

awesome..so no reason to state 32k in the description if llama.cpp supports yarn since 11/2023 and 128K by default.

jklj077

Qwen org about 1 month ago

if it is supported, you need to enable it. not by default.

jklj077

Qwen org about 1 month ago

from https://github.com/ggerganov/llama.cpp/blob/c421ac072d46172ab18924e1e8be53680b54ed3b/examples/server/README.md

--rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768

Argument	Explanation
`--rope-scaling {none,linear,yarn}`	RoPE frequency scaling method, defaults to linear unless specified by the model (env: LLAMA_ARG_ROPE_SCALING_TYPE)
`--rope-scale N`	RoPE context scaling factor, expands context by a factor of N (env: LLAMA_ARG_ROPE_SCALE)
`--rope-freq-base N`	RoPE base frequency, used by NTK-aware scaling (default: loaded from model) (env: LLAMA_ARG_ROPE_FREQ_BASE)
`--rope-freq-scale N`	RoPE frequency scaling factor, expands context by a factor of 1/N (env: LLAMA_ARG_ROPE_FREQ_SCALE)
`--yarn-orig-ctx N`	YaRN: original context size of model (default: 0 = model training context size) (env: LLAMA_ARG_YARN_ORIG_CTX)
`--yarn-ext-factor N`	YaRN: extrapolation mix factor (default: -1.0, 0.0 = full interpolation) (env: LLAMA_ARG_YARN_EXT_FACTOR)
`--yarn-attn-factor N`	YaRN: scale sqrt(t) or attention magnitude (default: 1.0) (env: LLAMA_ARG_YARN_ATTN_FACTOR)
`--yarn-beta-slow N`	YaRN: high correction dim or alpha (default: 1.0) (env: LLAMA_ARG_YARN_BETA_SLOW)
`--yarn-beta-fast N`	YaRN: low correction dim or beta (default: 32.0) (env: LLAMA_ARG_YARN_BETA_FAST)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment