Update README.md

3797f93 verified about 2 months ago

4.29 kB

	---
	license: llama3.1
	language:
	- en
	base_model:
	- meta-llama/Llama-3.1-8B-Instruct
	pipeline_tag: question-answering
	tags:
	- biology
	- medical
	datasets:
	- shellwork/ChatParts_Dataset
	---

	# shellwork/ChatParts-llama3.1-8b

	🤖 [XJTLU-Software RAG GitHub Repository](https://github.com/shellwork/XJTLU-Software-RAG/tree/main) • 📊 [ChatParts Dataset](https://huggingface.co/datasets/shellwork/ChatParts_Dataset)

	shellwork/ChatParts-llama3.1-8b is a specialized dialogue model fine-tuned from Meta-Llama-3.1-8B-Instruct by the XJTLU-Software iGEM Competition team. This model is tailored for the synthetic biology domain, aiming to assist competition participants and researchers in efficiently collecting and organizing relevant information. It serves as the local model component of the XJTLU-developed Retrieval-Augmented Generation (RAG) software, enhancing search and summarization capabilities within synthetic biology data.

	## 📚 Dataset Information

	The model is trained on a comprehensive synthetic biology-specific dataset curated from multiple authoritative sources:

	- iGEM Wiki Pages (2004-2023): Comprehensive coverage of synthetic biology topics from over two decades of iGEM competitions.
	- Synthetic Biology Review Papers: More than 1,000 high-quality review articles providing in-depth insights into various aspects of synthetic biology.
	- iGEM Parts Registry Documentation: Detailed documentation of parts used in iGEM projects, facilitating accurate information retrieval.

	In total, the dataset comprises over 200,000 question-answer pairs, meticulously assembled to cover a wide spectrum of synthetic biology topics. For more detailed information about the dataset, please visit our [training data repository](https://huggingface.co/datasets/shellwork/ChatParts_Dataset).

	## 🛠️ How to Use

	This repository supports usage with the `transformers` library. Below is a straightforward example of how to deploy the shellwork/ChatParts-llama3.1-8b model using `transformers`.

	### Requirements

	- Transformers Library: Ensure you have `transformers` version >= 4.43.0 installed. You can update your installation using:

	```bash
	pip install --upgrade transformers
	```

	### Example: Deploying with Transformers

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import json

	# Load the tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained('shellwork/ChatParts-llama3.1-8b', trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	'shellwork/ChatParts-llama3.1-8b',
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	device_map='auto'
	)

	# Example context from synthetic biology literature
	context = '''
	Synthetic biology enables the design and construction of new biological parts, devices, and systems, or the re-design of existing natural biological systems.
	'''

	query = "What is the goal of synthetic biology?"

	# Generate the response with fine-grained citations
	result = model.query_longcite(
	context,
	query,
	tokenizer=tokenizer,
	max_input_length=128000,
	max_new_tokens=1024
	)

	# Display the results
	print("Answer:\n{}\n".format(result['answer']))
	print("Statement with citations:\n{}\n".format(
	json.dumps(result['statements_with_citations'], indent=2, ensure_ascii=False)
	))
	print("Context (divided into sentences):\n{}\n".format(result['splited_context']))
	```


	## 📄 License

	This model is released under the Llama-3.1 License. For more details, please refer to the [license information](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) in the repository.

	## 🔗 Additional Resources

	- RAG Software: Explore the full capabilities of our Retrieval-Augmented Generation software [here](https://github.com/shellwork/XJTLU-Software-RAG/tree/main).
	- Training Data: Access and review the extensive training dataset [here](https://huggingface.co/datasets/shellwork/ChatParts_Dataset) .
	- Support & Contributions: For support or to contribute to the project, visit our [GitHub Issues](https://github.com/shellwork/XJTLU-Software-RAG/issues) page.


	Feel free to reach out through our GitHub repository for any questions, issues, or contributions related to shellwork/ChatParts-llama3.1-8b.