Medical-Llama3-8B-GPTQ
This is a fine-tuned version of the Llama3 8B model, specifically designed to answer medical questions. The model was trained on the AI Medical Chatbot dataset, which can be found at ruslanmv/ai-medical-chatbot. This fine-tuned model leverages technique GPTQ for efficient inference with 4-bit quantization. GPTQ is a technique for compressing deep learning model weights through a 4-bit quantization process that targets efficient GPU inference. This approach aims to reduce model size by converting weights to a 4-bit representation while controlling error. For better performance during inference, GPTQ dynamically restores the weights to float16, balancing the benefits of reduced memory usage with computational efficiency.
Model: ruslanmv/Medical-Llama3-8B-GPTQ
- Developed by: ruslanmv
- License: apache-2.0
- Finetuned from model: meta-llama/Meta-Llama-3-8B
Installation
Prerequisites:
- A system with CUDA support is highly recommended for optimal performance.
- Python 3.10 or later
Installation Steps:
Install required Python libraries:
pip install transformers==4.40.0
Usage
Here's an example of how to use the Medical-Llama3-8B-GPTQ model to generate an answer to a medical question:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json
device = "cuda:0" if torch.cuda.is_available() else "cpu"
repo_id = "ruslanmv/Medical-Llama3-8B-GPTQ"
# download quantized model from Hugging Face Hub and load to the first GPU
model = AutoGPTQForCausalLM.from_quantized(repo_id,
device=device,
use_safetensors=True,
use_triton=False)
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
def create_prompt(user_query):
B_INST, E_INST = "<s>[INST]", "[/INST]"
B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
DEFAULT_SYSTEM_PROMPT = """\
You are an AI Medical Chatbot Assistant, I aim to provide comprehensive and informative responses to your inquiries. However, please note that while I strive for accuracy, my responses should not replace professional medical advice and short answers.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""
SYSTEM_PROMPT = B_SYS + DEFAULT_SYSTEM_PROMPT + E_SYS
instruction = f"User asks: {user_query}\n"
prompt = B_INST + SYSTEM_PROMPT + instruction + E_INST
return prompt.strip()
def generate_text(model, tokenizer, prompt,
max_length=200,
temperature=0.7,
num_return_sequences=1):
prompt = create_prompt(user_query)
# Tokenize the prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device) # Move input_ids to the same device as the model
# Generate text
output = model.generate(
input_ids=input_ids,
max_length=max_length,
temperature=temperature,
num_return_sequences=num_return_sequences,
pad_token_id=tokenizer.eos_token_id, # Set pad token to end of sequence token
do_sample=True
)
# Decode the generated output
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
# Split the generated text based on the prompt and take the portion after it
generated_text = generated_text.split(prompt)[-1].strip()
return generated_text
Inference Example
This section showcases how to use the model for inference.
User Query:
user_query = "I'm a 35-year-old male experiencing symptoms like fatigue, increased sensitivity to cold, and dry, itchy skin. Could these be indicative of hypothyroidism?"
Answer:
generated_text = generate_text(model, tokenizer, user_query)
print(generated_text)
You will get
I understand your concern. It could be attributed to hypothyroidism. You may also have perifollicular inflammation. I suggest you to get your thyroid profile done to rule out hypothyroidism. I would also suggest you to use a mild moisturizing cream, with sunscreen, to
License
This model is licensed under the Apache License 2.0. You can find the full license in the LICENSE file.
- Downloads last month
- 23
Model tree for ruslanmv/Medical-Llama3-8B-GPTQ
Base model
meta-llama/Meta-Llama-3-8B