Summarization
Adapters
TensorBoard
Safetensors
English
medical
K23_MiniMed / README.md
Tonic's picture
Update README.md
dee1ca6
metadata
license: mit
datasets:
  - keivalya/MedQuad-MedicalQnADataset
language:
  - en
library_name: adapter-transformers
metrics:
  - accuracy
  - bertscore
  - bleu
pipeline_tag: summarization
tags:
  - medical

K23 MiniMed ๋ชจ๋ธ ์นด๋“œ

K23 MiniMed๋Š” Krew x Huggingface 2023 ํ•ด์ปคํ†ค์—์„œ ์›ํ˜•์„ ๋ฉ˜ํ† ์˜ ์ง€๋„ํ•˜์— ๊ฐœ๋ฐœ๋œ Mistral 7b Beta Medical Fine Tune ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

๋ชจ๋ธ ์„ธ๋ถ€์‚ฌํ•ญ

  • ๊ฐœ๋ฐœ์ž: Tonic

  • ํ›„์›: Tonic

  • ๊ณต์œ ์ž: K23-Krew-Hackathon

  • ๋ชจ๋ธ ์œ ํ˜•: Mistral 7B-Beta Medical Fine Tune

  • ์–ธ์–ด (NLP): ์˜์–ด

  • ๋ผ์ด์„ผ์Šค: MIT

  • Fine-tuning ๊ธฐ๋ฐ˜ ๋ชจ๋ธ: Zephyr 7B-Beta

๋ชจ๋ธ ์ถœ์ฒ˜

์‚ฌ์šฉ๋ฒ•

์ด ๋ชจ๋ธ์€ ๊ต์œก ๋ชฉ์ ์œผ๋กœ๋งŒ ์˜ํ•™ ์งˆ๋ฌธ ๋‹ต๋ณ€์„ ์œ„ํ•œ ๋Œ€ํ™”ํ˜• ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์šฉ์ž…๋‹ˆ๋‹ค.

์ง์ ‘ ์‚ฌ์šฉ

Gradio ์ฑ—๋ด‡ ์•ฑ์„ ๋งŒ๋“ค์–ด ์˜ํ•™์  ์งˆ๋ฌธ์„ ํ•˜๊ณ  ๋Œ€ํ™”์‹์œผ๋กœ ๋‹ต๋ณ€์„ ๋ฐ›์Šต๋‹ˆ๋‹ค.

ํ•˜๋ฅ˜ ์‚ฌ์šฉ

์ด ๋ชจ๋ธ์€ ๊ต์œก์šฉ์œผ๋กœ๋งŒ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ถ”๊ฐ€์ ์ธ Fine-tuning๊ณผ ์‚ฌ์šฉ ์˜ˆ์‹œ๋กœ๋Š” ๊ณต์ค‘ ๋ณด๊ฑด & ์œ„์ƒ, ๊ฐœ์ธ ๋ณด๊ฑด & ์œ„์ƒ, ์˜ํ•™ Q & A๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์ถ”์ฒœ์‚ฌํ•ญ

์‚ฌ์šฉ ์ „์— ํ•ญ์ƒ ์ด ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๊ณ  ๋ฒค์น˜๋งˆํ‚นํ•˜์‹ญ์‹œ์˜ค. ์‚ฌ์šฉ ์ „์— ํŽธํ–ฅ์„ ํ‰๊ฐ€ํ•˜์‹ญ์‹œ์˜ค. ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜์ง€ ๋งˆ์‹œ๊ณ  ์ถ”๊ฐ€์ ์œผ๋กœ Fine-tuningํ•˜์‹ญ์‹œ์˜ค.

ํ›ˆ๋ จ ์„ธ๋ถ€์‚ฌํ•ญ

๋ชจ๋ธ์˜ ํ›ˆ๋ จ ์†์‹ค์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

๋‹จ๊ณ„ ํ›ˆ๋ จ ์†์‹ค
50 0.993800
100 0.620600
150 0.547100
200 0.524100
250 0.520500
300 0.559800
350 0.535500
400 0.505400

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ

๋ชจ๋ธ์˜ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜: 21260288, ๋ชจ๋“  ๋งค๊ฐœ๋ณ€์ˆ˜: 3773331456, ํ•™์Šต ๊ฐ€๋Šฅํ•œ %: 0.5634354746703705.

๊ฒฐ๊ณผ

global_step=400์—์„œ์˜ ํ›ˆ๋ จ ์†์‹ค์€ 0.6008514881134033์ž…๋‹ˆ๋‹ค.

ํ™˜๊ฒฝ ์˜ํ–ฅ

๋ชจ๋ธ์˜ ํ™˜๊ฒฝ ์˜ํ–ฅ์€ ๋จธ์‹ ๋Ÿฌ๋‹ ์˜ํ–ฅ ๊ณ„์‚ฐ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ถ”์ •์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋” ๋งŽ์€ ์„ธ๋ถ€ ์ •๋ณด๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ์ˆ  ์‚ฌ์–‘

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์™€ ๋ชฉํ‘œ

๋ชจ๋ธ์€ ํŠน์ • ์„ค์ •์„ ๊ฐ€์ง„ PeftModelForCausalLM์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ปดํ“จํŒ… ์ธํ”„๋ผ

ํ•˜๋“œ์›จ์–ด

๋ชจ๋ธ์€ A100 ํ•˜๋“œ์›จ์–ด์—์„œ ํ›ˆ๋ จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์†Œํ”„ํŠธ์›จ์–ด

์‚ฌ์šฉ๋œ ์†Œํ”„ํŠธ์›จ์–ด์—๋Š” peft, torch, bitsandbytes, python, ๊ทธ๋ฆฌ๊ณ  huggingface๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ์นด๋“œ ์ž‘์„ฑ์ž

Tonic

๋ชจ๋ธ ์นด๋“œ ์—ฐ๋ฝ์ฒ˜

Tonic

Model Card for K23 MiniMed

This is a Mistral 7b Beta Medical Fine Tune with a short number of steps , inspired by Wonhyeong Seo great mentorship during Krew x Huggingface 2023 hackathon.

Model Details

Model Description

  • Developed by: Tonic
  • Funded by [optional]: Tonic
  • Shared by [optional]: K23-Krew-Hackathon
  • Model type: Mistral 7B-Beta Medical Fine Tune
  • Language(s) (NLP): English
  • License: MIT
  • Finetuned from model [optional]: Zephyr 7B-Beta

Model Sources [optional]

Uses

Use this model for conversational applications for medical question and answering for educational purposes only !

Direct Use

Make a gradio chatbot app to ask medical questions and get answers conversationaly.

Downstream Use [optional]

This model is for educational use only .

Further fine tunes and uses would include :

  • public health & sanitation
  • personal health & sanitation
  • medical Q & A

Recommendations

  • always evaluate this model before use
  • always benchmark this model before use
  • always evaluate bias before use
  • do not use as is, fine tune further

How to Get Started with the Model

Use the code below to get started with the model.


from transformers import AutoConfig, AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM, MistralForCausalLM
from peft import PeftModel, PeftConfig
import torch
import gradio as gr
import random
from textwrap import wrap

# Functions to Wrap the Prompt Correctly
def wrap_text(text, width=90):
    lines = text.split('\n')
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]
    wrapped_text = '\n'.join(wrapped_lines)
    return wrapped_text

def multimodal_prompt(user_input, system_prompt="You are an expert medical analyst:"):
    # Combine user input and system prompt
    formatted_input = f"<s>[INST]{system_prompt} {user_input}[/INST]"

    # Encode the input text
    encodeds = tokenizer(formatted_input, return_tensors="pt", add_special_tokens=False)
    model_inputs = encodeds.to(device)

    # Generate a response using the model
    output = model.generate(
        **model_inputs,
        max_length=max_length,
        use_cache=True,
        early_stopping=True,
        bos_token_id=model.config.bos_token_id,
        eos_token_id=model.config.eos_token_id,
        pad_token_id=model.config.eos_token_id,
        temperature=0.1,
        do_sample=True
    )

    # Decode the response
    response_text = tokenizer.decode(output[0], skip_special_tokens=True)

    return response_text

# Define the device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Use the base model's ID
base_model_id = "HuggingFaceH4/zephyr-7b-beta"
model_directory = "pseudolab/K23_MiniMed"

# Instantiate the Tokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1", trust_remote_code=True, padding_side="left")
# tokenizer = AutoTokenizer.from_pretrained("Tonic/mistralmed", trust_remote_code=True, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'left'

# Specify the configuration class for the model
#model_config = AutoConfig.from_pretrained(base_model_id)

# Load the PEFT model with the specified configuration
#peft_model = AutoModelForCausalLM.from_pretrained(base_model_id, config=model_config)

# Load the PEFT model
peft_config = PeftConfig.from_pretrained("pseudolab/K23_MiniMed")
peft_model = MistralForCausalLM.from_pretrained("https://huggingface.co/HuggingFaceH4/zephyr-7b-beta", trust_remote_code=True)
peft_model = PeftModel.from_pretrained(peft_model, "pseudolab/K23_MiniMed")

class ChatBot:
    def __init__(self):
        self.history = []

class ChatBot:
    def __init__(self):
        # Initialize the ChatBot class with an empty history
        self.history = []

    def predict(self, user_input, system_prompt="You are an expert medical analyst:"):
        # Combine the user's input with the system prompt
        formatted_input = f"<s>[INST]{system_prompt} {user_input}[/INST]"

        # Encode the formatted input using the tokenizer
        user_input_ids = tokenizer.encode(formatted_input, return_tensors="pt")

        # Generate a response using the PEFT model
        response = peft_model.generate(input_ids=user_input_ids, max_length=512, pad_token_id=tokenizer.eos_token_id)

        # Decode the generated response to text
        response_text = tokenizer.decode(response[0], skip_special_tokens=True)
        
        return response_text  # Return the generated response

bot = ChatBot()

title = "๐Ÿ‘‹๐Ÿปํ† ๋‹‰์˜ ๋ฏธ์ŠคํŠธ๋ž„๋ฉ”๋“œ ์ฑ„ํŒ…์— ์˜ค์‹  ๊ฒƒ์„ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค๐Ÿš€๐Ÿ‘‹๐ŸปWelcome to Tonic's MistralMed Chat๐Ÿš€"
description = "์ด ๊ณต๊ฐ„์„ ์‚ฌ์šฉํ•˜์—ฌ ํ˜„์žฌ ๋ชจ๋ธ์„ ํ…Œ์ŠคํŠธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. [(Tonic/MistralMed)](https://huggingface.co/Tonic/MistralMed) ๋˜๋Š” ์ด ๊ณต๊ฐ„์„ ๋ณต์ œํ•˜๊ณ  ๋กœ์ปฌ ๋˜๋Š” ๐Ÿค—HuggingFace์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. [Discord์—์„œ ํ•จ๊ป˜ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด Discord์— ๊ฐ€์ž…ํ•˜์‹ญ์‹œ์˜ค](https://discord.gg/VqTxc76K3u). You can use this Space to test out the current model [(Tonic/MistralMed)](https://huggingface.co/Tonic/MistralMed) or duplicate this Space and use it locally or on ๐Ÿค—HuggingFace. [Join me on Discord to build together](https://discord.gg/VqTxc76K3u)."
examples = [["[Question:] What is the proper treatment for buccal herpes?", "You are a medicine and public health expert, you will receive a question, answer the question, and provide a complete answer"]]

iface = gr.Interface(
    fn=bot.predict,
    title=title,
    description=description,
    examples=examples,
    inputs=["text", "text"],  # Take user input and system prompt separately
    outputs="text",
    theme="ParityError/Anime"
)

iface.launch()

Training Details

Step Training Loss
50 0.993800
100 0.620600
150 0.547100
200 0.524100
250 0.520500
300 0.559800
350 0.535500
400 0.505400

Training Data


{trainable params: 21260288 || all params: 3773331456 || trainable%: 0.5634354746703705}

Training Procedure

Preprocessing [optional]

Lora32bits

Speeds, Sizes, Times [optional]

 metrics={'train_runtime': 1700.1608, 'train_samples_per_second': 1.882, 'train_steps_per_second': 0.235, 'total_flos': 9.585300996096e+16, 'train_loss': 0.6008514881134033, 'epoch': 0.2})

Results

TrainOutput

global_step=400, training_loss=0.6008514881134033

Summary

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: {{ hardware | default("[More Information Needed]", true)}}
  • Hours used: {{ hours_used | default("[More Information Needed]", true)}}
  • Cloud Provider: {{ cloud_provider | default("[More Information Needed]", true)}}
  • Compute Region: {{ cloud_region | default("[More Information Needed]", true)}}
  • Carbon Emitted: {{ co2_emitted | default("[More Information Needed]", true)}}

Technical Specifications

Model Architecture and Objective


PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32000, 4096)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralAttention(
              (q_proj): Linear4bit(
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
              )
              (k_proj): Linear4bit(
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=1024, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
              )
              (v_proj): Linear4bit(
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=1024, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
              )
              (o_proj): Linear4bit(
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
              )
              (rotary_emb): MistralRotaryEmbedding()
            )
            (mlp): MistralMLP(
              (gate_proj): Linear4bit(
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=14336, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False)
              )
              (up_proj): Linear4bit(
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=14336, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (base_layer): Linear4bit(in_features=4096, out_features=14336, bias=False)
              )
              (down_proj): Linear4bit(
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=14336, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (base_layer): Linear4bit(in_features=14336, out_features=4096, bias=False)
              )
              (act_fn): SiLUActivation()
            )
            (input_layernorm): MistralRMSNorm()
            (post_attention_layernorm): MistralRMSNorm()
          )
        )
        (norm): MistralRMSNorm()
      )
      (lm_head): Linear(
        in_features=4096, out_features=32000, bias=False
        (lora_dropout): ModuleDict(
          (default): Dropout(p=0.05, inplace=False)
        )
        (lora_A): ModuleDict(
          (default): Linear(in_features=4096, out_features=8, bias=False)
        )
        (lora_B): ModuleDict(
          (default): Linear(in_features=8, out_features=32000, bias=False)
        )
        (lora_embedding_A): ParameterDict()
        (lora_embedding_B): ParameterDict()
      )
    )
  )
)

Compute Infrastructure

Hardware

A100

Software

peft , torch, bitsandbytes, python, huggingface

Model Card Authors [optional]

Tonic

Model Card Contact

Tonic