Can you share the server code for local deploy?
#4
by
merlinarer
- opened
Thanks for the nice work! I want to deploy the chat model in my GPUs with your palyground, while I fail to process the stream properly. Can you share the server code that process the prompt and return stream ?
I use the following code:
output = ""
stream = pipe(prompt)
for idx, response in enumerate(stream):
output += response['generated_text'].replace(prompt, '')
if idx == 0:
history.append(" " + output)
else:
history[-1] = output
chat = [(history[i].strip(), history[i + 1].strip()) for i in range(0, len(history) - 1, 2)]
yield chat, history, user_message, ""
while it can only respose in first time and got nothing after that. I check it and find that, everytime the pipe just generate a /n
after prompt and that is why user got nothing .