File size: 7,162 Bytes

---
license: apache-2.0
datasets:
- Tongda/bid-announcement-zh-v1.0
base_model:
- Qwen/Qwen2-1.5B-Instruct
pipeline_tag: text-generation
tags:
- text-generation-inference
library_name: transformers
---


## **Model Overview**

This model is a fine-tuned version of the Qwen2-1.5-Instruct using Low-Rank Adaptation (LoRA). It is specifically designed for extracting key information from bidding and bid-winning announcements. The model focuses on identifying structured data such as project names, announcement types, budget amounts, and deadlines in various formats of bidding notices.

The base model, Qwen2-1.5-Instruct, is a large-scale language model optimized for instruction-following tasks, and this fine-tuned version leverages its capabilities for precise data extraction tasks in Chinese bid announcement contexts.

---

## **Use Cases**

The model can be used in applications that require the automatic extraction of structured data from text documents, particularly related to government bidding and procurement processes. For instance, based on [the sample announcement](https://www.qhggzyjy.gov.cn/ggzy/jyxx/001002/001002001/20240827/1358880795267533.html), the generated output is as follows:

```
项目名称："大通县公安局警用无人自动化机场项目"
公告类型："采购公告-竞磋"
行业分类："其他"
发布时间："2024-08-27"
预算金额："941500.00元"
采购人："大通县公安局（本级）"
响应文件截至提交时间："2024-09-10 09:00"
开标地址："大通县政府采购服务中心"
所在地区："青海省"
```
---

## **Key Features**

1. **Fine-tuned with LoRA**: The model has been adapted using LoRA, a parameter-efficient fine-tuning method, allowing it to focus on specific tasks while maintaining the power of the large base model.
   
2. **Robust Information Extraction**: The model is trained to extract and validate crucial fields, including budget values, submission deadlines, and industry classifications, ensuring accurate outputs even when encountering variable formats.

3. **Language & Domain Specificity**: The model excels in parsing official bidding announcements in Chinese and accurately extracting the required information for downstream processes.

---

## **Model Architecture**

- **Base Model**: Qwen2-1.5B-Instruct
- **Fine-Tuning Technique**: LoRA
- **Training Data**: Fine-tuned on structured and unstructured government bidding announcements
- **Framework**: Hugging Face Transformers & PEFT (Parameter Efficient Fine Tuning)
  
## **Technical Specifications**

- **Device Compatibility**: CUDA (GPU-enabled)
- **Tokenization**: Utilizes `AutoTokenizer` from Hugging Face, optimized for instruction-following tasks.

## **Requirements**

```shell
pip install --upgrade 'transformers>=4.44.2' 'torch>=2.0' accelerate
```

## **Usage Example**

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device = "cuda" 

model = AutoModelForCausalLM.from_pretrained("Tongda/Tongda1-1.5B-BKI", device_map="auto", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("Tongda/Tongda1-1.5B-BKI")

model.eval()

instruction = "分析给定的公告，提取其中的“项目名称”、“公告类型”、“行业分类”、“发布时间”、“预算金额”、“采购人”、“响应文件截至提交时间”、”开标地址“、“所在地区”，并将其以json格式进行输出。如果公告出现“最高投标限价”相关的值，则“预算金额”为该值。请再三确认提取的值为项目的“预算金额”，而不是其他和“预算金额”无关的数值，否则“预算金额”中填入'None'。如果确认提取到了“预算金额”，请重点确认提取到的金额的单位，所有的“预算金额”单位为“元”。当涉及到进制转换的计算（比如“万元”转换为“元”单位）时，必须进行进制转换。其中“公告类型”只能从以下12类中挑选：采购公告-招标、采购公告-邀标、采购公告-询价、采购公告-竞谈、采购公告-竞磋、采购公告-竞价、采购公告-单一来源、采购公告-变更、采购结果-中标、采购结果-终止、采购结果-废标、采购结果-合同。其中，“行业分类”只能从以下12类中挑选：建筑与基础设施、信息技术与通信、能源与环保、交通与物流、金融与保险、医疗与健康、教育与文化、农业与林业、制造与工业、政府与公共事业、旅游与娱乐、其他。"

# the content of any bid announcement
input_report = "#### 通答产业园区(2024年-2027年）智能一体化项目公开招标公告..."

messages = [
    {"role": "system", "content": instruction},
    {"role": "user", "content": input_report}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

response
```

---
## **Evaluation & Performance**

The Tongda1-1.5B-BKI model has shown remarkable performance in information extraction tasks. Compared to the baseline model Qwen2-1.5B-Instruct, Tongda1-1.5B-BKI excels across multiple evaluation metrics, particularly in extracting key information from tender announcements, achieving significant improvements. Even when compared to larger models like Qwen2.5-3B-Instruct and Qwen2-7B-Instruct, Tongda1-1.5B-BKI still demonstrates outstanding performance. Additionally, it outperforms the optimized online model glm-4-flash. Here are the evaluation results for each model:

| Model                  | ROUGE-1 | ROUGE-2 | ROUGE-Lsum | BLEU  |
|-----------------------|---------|---------|------------|-------|
| Tongda1-1.5B-BKI      | 0.853   | 0.787   | 0.853      | 0.852 |
| Qwen2-1.5B-Instruct   | 0.412   | 0.231   | 0.411      | 0.431 |
| Qwen2.5-3B-Instruct   | 0.686   | 0.578   | 0.687      | 0.755 |
| Qwen2-7B-Instruct     | 0.703   | 0.578   | 0.703      | 0.789 |
| glm-4-flash           | 0.774   | 0.655   | 0.775      | 0.816 |

![image/png](https://cdn-uploads.huggingface.co/production/uploads/65ebcbd0c8577b39464e6dc0/Qiyi7onDe99b2USArl0oG.png)

---

## **Limitations**

- **Language Limitation**: The model is primarily trained on Chinese bidding announcements. Performance on other languages or non-bidding content may be limited.
- **Strict Formatting**: The model may have reduced accuracy when the bidding announcements deviate significantly from common structures.

---

## **Citation**
If you use this model, please consider citing it as follows:

```
@inproceedings{Tongda1-1.5B-BKI,
  title={Tongda1-1.5B-BKI: LoRA Fine-tuned Model for Bidding Announcements},
  author={Ted-Z},
  year={2024}
}
```

## **Contact**
For further inquiries or fine-tuning services, please contact us at [Tongda](https://www.tongdaai.com/).