BAAI
/

Aquila2-34B

Text Generation

Model card Files Files and versions Community

Aquila2-34B / README.md

ldwang's picture

Update README.md

b0b4bea verified 5 months ago

|

3.03 kB

	---
	license: other
	---

	![Aquila_logo](./log.jpeg)

	<h4 align="center">
	<p>
	<b>English</b> \|
	<a href="https://huggingface.co/BAAI/Aquila2-34B/blob/main/README_zh.md">简体中文</a> \|
	<p>
	</h4>


	We opensource our Aquila2 series, now including Aquila2, the base language models, namely Aquila2-7B and Aquila2-34B, as well as AquilaChat2, the chat models, namely AquilaChat2-7B and AquilaChat2-34B, as well as the long-text chat models, namely AquilaChat2-7B-16k and AquilaChat2-34B-16k

	The additional details of the Aquila model will be presented in the official technical report. Please stay tuned for updates on official channels.

	## Updates 2024.6.6

	We have updated the basic language model Aquila2-34B, which has the following advantages compared to the previous model:

	* Replaced tokenizer with higher compression ratio:

	\| Tokenizer \| Size \| Zh \| En \| Code \| Math \| Average \|
	\|-----------\|-------\|--------------------------\|--------\|-------\|-------\|---------\|
	\| Aquila2-original \| 100k \| 4.70 \| 4.42 \| 3.20 \| 3.77 \| 4.02 \|
	\| Qwen1.5 \| 151k \| 4.27 \| 4.51 \| 3.62 \| 3.35 \| 3.94 \|
	\| Llama3 \| 128k \| 3.45 \| 4.61 \| 3.77 \| 3.88 \| 3.93 \|
	\| Aquila2-new \| 143k \| 4.60 \| 4.61 \| 3.78 \| 3.88 \| 4.22 \|

	* The maximum processing length supported by the model has increased from 2048 to 8192



	## Quick Start Aquila2-34B

	### 1. Inference
	Aquila2-34B is a base model that can be used for continuation.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from transformers import BitsAndBytesConfig

	device= "cuda:0"

	# Model Name
	model_name = 'BAAI/Aquila2-34B'

	# load model and tokenizer
	quantization_config=BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True,
	# quantization_config=quantization_config # Uncomment this one for 4-bit quantization
	)

	tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)

	model.eval()

	model.to(device)

	# Example
	text = "The meaning of life is"
	tokens = tokenizer.encode_plus(text)['input_ids']
	tokens = torch.tensor(tokens)[None,].to(device)

	with torch.no_grad():
	out = llama.generate(tokens, do_sample=False, max_length=128, eos_token_id=tokenizer.eos_token_id)[0]
	out = tokenizer.decode(out.cpu().numpy().tolist())
	print(out)
	```


	## License

	Aquila2 series open-source model is licensed under [ BAAI Aquila Model Licence Agreement](https://huggingface.co/BAAI/Aquila2-34B/blob/main/BAAI-Aquila-Model-License%20-Agreement.pdf)