update readme
Browse files
README.md
CHANGED
@@ -1,5 +1,95 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
|
5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
inference:
|
4 |
+
parameters:
|
5 |
+
max_length: 250
|
6 |
+
temperature: 0.7
|
7 |
+
top_p: 1
|
8 |
+
widget:
|
9 |
+
- text: 病人:我感觉自己好像发烧了,怎么办?\n医生:
|
10 |
+
- text: 病人:我最近感觉全身疲惫。\n医生:
|
11 |
+
language:
|
12 |
+
- en
|
13 |
+
- zh
|
14 |
---
|
15 |
|
16 |
+
## 简介
|
17 |
+
|
18 |
+
|
19 |
+
**扁鹊-1.0(BianQue-1.0)**是一个经过指令与多轮问询对话联合微调的医疗对话大模型。我们经过调研发现,在医疗领域,往往医生需要通过多轮问询才能进行决策,这并不是单纯的“指令-回复”模式。用户在咨询医生时,往往不会在最初就把完整的情况告知医生,因此医生需要不断进行询问,最后才能进行诊断并给出合理的建议。基于此,我们构建了**扁鹊-1.0(BianQue-1.0)**,拟在**强化AI系统的问询能力**,从而达到模拟医生问诊的过程。我们把这种能力定义为“望闻问切”当中的“问”。
|
20 |
+
|
21 |
+
以下是一个简单的例子:
|
22 |
+
|
23 |
+
* 例子1
|
24 |
+
用户:我最近感觉全身疲惫
|
25 |
+
扁鹊-1.0:是劳累了,还是熬夜了?
|
26 |
+
用户:这周都在熬夜赶论文
|
27 |
+
扁鹊-1.0:嗯,熬夜容易引起各种不适,如头晕、乏力、失眠、多梦等,多休息,可以口服谷维素、维生素B1和安神补脑液,避免熬夜。
|
28 |
+
|
29 |
+
* 例子2
|
30 |
+
用户:我最近经常失眠
|
31 |
+
扁鹊-1.0:失眠多久了?
|
32 |
+
用户:两周了
|
33 |
+
扁鹊-1.0:睡眠怎么样?
|
34 |
+
用户:上床几小时才睡得着
|
35 |
+
扁鹊-1.0:可以口服一些药物
|
36 |
+
用户:有具体推荐吗
|
37 |
+
扁鹊-1.0:需要先检查一下
|
38 |
+
用户:那我需要做什么检查?
|
39 |
+
扁鹊-1.0:心电图
|
40 |
+
|
41 |
+
|
42 |
+
## 使用方法
|
43 |
+
|
44 |
+
|
45 |
+
|
46 |
+
```python
|
47 |
+
import os
|
48 |
+
from transformers import T5Tokenizer, T5ForConditionalGeneration, AutoTokenizer
|
49 |
+
import torch
|
50 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
51 |
+
|
52 |
+
|
53 |
+
tokenizer = T5Tokenizer.from_pretrained("scutcyr/BianQue-1.0")
|
54 |
+
model = T5ForConditionalGeneration.from_pretrained("scutcyr/BianQue-1.0")
|
55 |
+
|
56 |
+
|
57 |
+
def preprocess(text):
|
58 |
+
text = text.replace("\n", "\\n").replace("\t", "\\t")
|
59 |
+
return text
|
60 |
+
|
61 |
+
def postprocess(text):
|
62 |
+
return text.replace("\\n", "\n").replace("\\t", "\t")
|
63 |
+
|
64 |
+
def answer(user_history, bot_history, sample=True, top_p=1, temperature=0.7):
|
65 |
+
'''sample:是否抽样。生成任务,可以设置为True;
|
66 |
+
top_p:0-1之间,生成的内容越多样
|
67 |
+
max_new_tokens=512 lost...'''
|
68 |
+
|
69 |
+
if len(bot_history)>0:
|
70 |
+
context = "\n".join([f"病人:{user_history[i]}\n医生:{bot_history[i]}" for i in range(len(bot_history))])
|
71 |
+
input_text = context + "\n病人:" + user_history[-1] + "\n医生:"
|
72 |
+
else:
|
73 |
+
input_text = "病人:" + user_history[-1] + "\n医生:"
|
74 |
+
return "我是利用人工智能技术,结合大数据训练得到的智能医疗问答模型扁鹊,你可以向我提问。"
|
75 |
+
|
76 |
+
|
77 |
+
input_text = preprocess(input_text)
|
78 |
+
print(input_text)
|
79 |
+
encoding = tokenizer(text=input_text, truncation=True, padding=True, max_length=768, return_tensors="pt").to(device)
|
80 |
+
if not sample:
|
81 |
+
out = model.generate(**encoding, return_dict_in_generate=True, output_scores=False, max_new_tokens=512, num_beams=1, length_penalty=0.6)
|
82 |
+
else:
|
83 |
+
out = model.generate(**encoding, return_dict_in_generate=True, output_scores=False, max_new_tokens=512, do_sample=True, top_p=top_p, temperature=temperature, no_repeat_ngram_size=3)
|
84 |
+
out_text = tokenizer.batch_decode(out["sequences"], skip_special_tokens=True)
|
85 |
+
print('医生: '+postprocess(out_text[0]))
|
86 |
+
return postprocess(out_text[0])
|
87 |
+
|
88 |
+
answer_text = answer(user_history=["你好!",
|
89 |
+
"我最近经常失眠",
|
90 |
+
"两周了",
|
91 |
+
"上床几小时才睡得着"],
|
92 |
+
bot_history=["我是利用人工智能技术,结合大数据训练得到的智能医疗问答模型扁鹊,你可以向我提问。",
|
93 |
+
"失眠多久了?",
|
94 |
+
"睡眠怎么样?"])
|
95 |
+
```
|