使用llma.cpp server.exe 运行没有任何输出,使用main.exe可以正常输出

#4
by weiboboer - opened

main.exe -m qwen1_5-14b-chat-q4_0.gguf -n 512 --color -i -cml 该方式可行
server.exe -m qwen1_5-14b-chat-q4_0.gguf 该方式调用http://localhost:8080/completion一直无任何输出
我在github也看到类似的问题,但是没有解决方案
https://github.com/ggerganov/llama.cpp/issues/4821

调用url应该是

http://localhost:8080/v1

某些api中你可能需要尝试

http://localhost:8080/v1/chat/completions/

以下是简单示例:

import openai

client = openai.OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="-",
)

completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    temperature=0,
    messages=[
        {"role": "system", "content": "请始终用中文回复"},
        {"role": "user", "content": "Tell a joke about summer"},
    ],
)

print(completion.choices[0].message.content)

Sign up or log in to comment