--- library_name: transformers tags: [] --- # Model Card for Model ID Finetuned "BioMistral/BioMistral-7B" with MedQA dataset. ## Model Details A Collection of Open-Source Pretrained Large Language Models for Medical Domains finetuned with MedQA dataset. ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** mychen76 - **Model type:** BioMedical - **Finetuned from model:** BioMistral/BioMistral-7B ### Model Sources [optional] - **dataset:** MedQA dataset ## How to Get Started with the Model Use the code below to get started with the model. Load Model: ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig base_model_id = "mychen76/biomistral_medqa_v1" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) model = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config) tokenizer = AutoTokenizer.from_pretrained( base_model_id, add_eos_token=True, add_bos_token=True, ) ## Uses ``` *** Information *** ``` eval_prompt = """From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer: ### Question type: information ### Question: What are the genetic changes related to X-linked lymphoproliferative disease ? ### Answer: """ model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda") ft_model.eval() with torch.no_grad(): print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=300)[0], skip_special_tokens=True)) ``` result: ``` From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer: ### Question type: information ### Question: What are the genetic changes related to X-linked lymphoproliferative disease ? ### Answer: X-linked lymphoproliferative disease (XLP) is a rare primary immunodeficiency syndrome. XLP is caused by mutations in SH2D1A gene, which encodes the cytoplasmic signaling protein SLAM-associated protein ( client protein-SLAM). SLAM is a member of the signaling lymphocytic activation molecule family of receptors, which are involved in the regulation of lymphocyte activation and proliferation. The SLAM receptor is expressed on the surface of B and T lymphocytes, natural killer cells, and monocytes. Mutations in SH2D1A gene lead to impaired signaling through the SLAM receptor, resulting in a deficiency in the activation and proliferation of B and T lymphocytes. This leads to a decrease in the number of B and T lymphocytes, resulting in a weakened immune response. ``` *** Frequency *** ``` eval_prompt = """From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer: ### Question type: frequency ### Question: How many people are affected by Smith-Lemli-Opitz syndrome ? ### Answer: """ model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda") ft_model.eval() with torch.no_grad(): print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=300)[0], skip_special_tokens=True)) ``` result: ``` From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer: ### Question type: frequency ### Question: How many people are affected by Smith-Lemli-Opitz syndrome ? ### Answer: Smith-Lemli-Opitz syndrome (SLOS) is a rare autosomal recessive disorder of human development. It is characterized by a wide range of symptoms, including growth and developmental delay, intellectual disability, characteristic facial features, and congenital heart defects. The prevalence of SLOS is estimated to be 1 in 15,000 to 1 in 25,000 live births. ``` *** Symptons *** ``` eval_prompt = """From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer: ### Question type: symptoms ### Question: What are the symptoms of Norrie disease ? ### Answer: """ model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda") ft_model.eval() with torch.no_grad(): print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=300)[0], skip_special_tokens=True)) ``` Result: ``` Setting `pad_token_id` to `eos_token_id`:2 for open-end generation. From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer: ### Question type: symptoms ### Question: What are the symptoms of Norrie disease ? ### Answer: Norrie disease is a rare, X-linked recessive disorder of the blood vessels. It is characterized by a variety of symptoms, including glaucoma, mental retardation, seizures, and deafness. ``` ### Out-of-Scope Use images [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ### Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. [More Information Needed] ## Training Details ### Training Data - **dataset:** keivalya/MedQuad-MedicalQnADataset [More Information Needed] ### Training Procedure ## Citation Arxiv : https://arxiv.org/abs/2402.10373 @misc{labrak2024biomistral, title={BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains}, author={Yanis Labrak and Adrien Bazoge and Emmanuel Morin and Pierre-Antoine Gourraud and Mickael Rouvier and Richard Dufour}, year={2024}, eprint={2402.10373}, archivePrefix={arXiv}, primaryClass={cs.CL} }