Update README.md
Browse files
README.md
CHANGED
@@ -74,7 +74,7 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
74 |
|
75 |
To handle extensive inputs exceeding 32,768 tokens, we utilize [YARN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
|
76 |
|
77 |
-
For deployment, we recommend using vLLM. You can enable long-context capabilities
|
78 |
|
79 |
1. **Install vLLM**: Ensure you have the latest version from the main branch of [vLLM](https://github.com/vllm-project/vllm).
|
80 |
|
|
|
74 |
|
75 |
To handle extensive inputs exceeding 32,768 tokens, we utilize [YARN](https://arxiv.org/abs/2309.00071), a technique for enhancing model length extrapolation, ensuring optimal performance on lengthy texts.
|
76 |
|
77 |
+
For deployment, we recommend using vLLM. You can enable the long-context capabilities by following these steps:
|
78 |
|
79 |
1. **Install vLLM**: Ensure you have the latest version from the main branch of [vLLM](https://github.com/vllm-project/vllm).
|
80 |
|