pravdin commited on
Commit
f7af970
1 Parent(s): 17ccda3

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +30 -46
README.md CHANGED
@@ -1,43 +1,21 @@
1
- # Llama3-ChatQA-1.5-8B-Llama3-8B-Chinese-Chat-linear-merge
2
-
3
- ## Models Used
4
-
5
- - nvidia/Llama3-ChatQA-1.5-8B
6
- - shenzhi-wang/Llama3-8B-Chinese-Chat
7
-
8
- ## Configuration
9
-
10
- ```yaml
11
- models:
12
- - model: nvidia/Llama3-ChatQA-1.5-8B
13
- parameters:
14
- weight: 0.5
15
- - model: shenzhi-wang/Llama3-8B-Chinese-Chat
16
- parameters:
17
- weight: 0.5
18
- merge_method: linear
19
- parameters:
20
- normalize: true
21
- dtype: float16
22
- ```
23
  ---
24
- license: llama3
25
  tags:
26
  - merge
27
  - mergekit
28
  - lazymergekit
29
- - Llama3-ChatQA-1.5-8B
30
- - Llama3-8B-Chinese-Chat
31
  ---
32
 
33
  # Llama3-ChatQA-1.5-8B-Llama3-8B-Chinese-Chat-linear-merge
34
 
35
- Llama3-ChatQA-1.5-8B-Llama3-8B-Chinese-Chat-linear-merge is an innovative language model resulting from the strategic combination of two powerful models: [Llama3-ChatQA-1.5-8B](https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B) and [Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat). The merging process utilized [mergekit](https://github.com/mergekit), a specialized tool designed for effective model blending, ensuring optimal performance and synergy between the two architectures.
 
 
36
 
37
  ## 🧩 Merge Configuration
38
 
39
- The models were merged using a linear interpolation method, which allows for a balanced integration of both models' capabilities. The configuration for this merge is as follows:
40
-
41
  ```yaml
42
  models:
43
  - model: nvidia/Llama3-ChatQA-1.5-8B
@@ -52,31 +30,37 @@ parameters:
52
  dtype: float16
53
  ```
54
 
55
- ## Model Features
56
 
57
- This merged model combines the conversational question-answering prowess of Llama3-ChatQA-1.5 with the bilingual capabilities of Llama3-8B-Chinese-Chat. As a result, it excels in various text generation tasks, including but not limited to:
58
 
59
- - Conversational question answering in both English and Chinese.
60
- - Enhanced context understanding and nuanced text generation.
61
- - Improved performance in retrieval-augmented generation (RAG) tasks.
62
 
63
- By leveraging the strengths of both parent models, this fusion model is particularly adept at handling complex queries and generating contextually relevant responses across languages.
64
 
65
- ## Evaluation Results
 
 
 
66
 
67
- The evaluation results of the parent models indicate their strong performance in various benchmarks. For instance, Llama3-ChatQA-1.5-8B has shown impressive results in the ChatRAG Bench, outperforming many existing models in conversational QA tasks. Meanwhile, Llama3-8B-Chinese-Chat has demonstrated superior performance in Chinese language tasks, surpassing ChatGPT and matching GPT-4 in various evaluations.
68
 
69
- | Model | Average Score (ChatRAG Bench) |
70
- |-------|-------------------------------|
71
- | Llama3-ChatQA-1.5-8B | 55.17 |
72
- | Llama3-8B-Chinese-Chat | Not specified, but noted for high performance |
73
 
74
- ## Limitations
 
 
75
 
76
- While the Llama3-ChatQA-1.5-8B-Llama3-8B-Chinese-Chat-linear-merge model offers enhanced capabilities, it may also inherit some limitations from its parent models. These include:
 
 
 
 
 
77
 
78
- - Potential biases present in the training data of both models, which could affect the generated outputs.
79
- - The model's performance may vary depending on the complexity of the queries, especially in less common languages or dialects.
80
- - As with any AI model, it may struggle with ambiguous queries or context that is not well-defined.
81
 
82
- In summary, the Llama3-ChatQA-1.5-8B-Llama3-8B-Chinese-Chat-linear-merge model represents a significant advancement in multilingual conversational AI, combining the best features of its predecessors while also carrying forward some of their limitations.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  tags:
4
  - merge
5
  - mergekit
6
  - lazymergekit
7
+ - nvidia/Llama3-ChatQA-1.5-8B
8
+ - shenzhi-wang/Llama3-8B-Chinese-Chat
9
  ---
10
 
11
  # Llama3-ChatQA-1.5-8B-Llama3-8B-Chinese-Chat-linear-merge
12
 
13
+ Llama3-ChatQA-1.5-8B-Llama3-8B-Chinese-Chat-linear-merge is a merge of the following models using [mergekit](https://github.com/cg123/mergekit):
14
+ * [nvidia/Llama3-ChatQA-1.5-8B](https://huggingface.co/nvidia/Llama3-ChatQA-1.5-8B)
15
+ * [shenzhi-wang/Llama3-8B-Chinese-Chat](https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat)
16
 
17
  ## 🧩 Merge Configuration
18
 
 
 
19
  ```yaml
20
  models:
21
  - model: nvidia/Llama3-ChatQA-1.5-8B
 
30
  dtype: float16
31
  ```
32
 
33
+ ## Model Details
34
 
35
+ The Llama3-ChatQA-1.5 model excels in conversational question answering (QA) and retrieval-augmented generation (RAG). It is built on an improved training recipe from the [ChatQA paper](https://arxiv.org/pdf/2401.10225) and incorporates extensive conversational QA data to enhance its capabilities in tabular and arithmetic calculations. The model is designed to provide detailed and contextually relevant responses, making it suitable for a variety of applications.
36
 
37
+ On the other hand, Llama3-8B-Chinese-Chat is specifically fine-tuned for Chinese and English users, showcasing remarkable performance in roleplaying, function calling, and math capabilities. It has been trained on a mixed dataset of approximately 100K preference pairs, significantly improving its ability to handle bilingual interactions.
 
 
38
 
39
+ ## Use Cases
40
 
41
+ - **Conversational AI**: Engage users in natural dialogues, providing informative and context-aware responses.
42
+ - **Question Answering**: Answer user queries accurately, leveraging the strengths of both English and Chinese language processing.
43
+ - **Multilingual Support**: Cater to users who communicate in both English and Chinese, enhancing accessibility and user experience.
44
+ - **Educational Tools**: Assist in learning and understanding complex topics through interactive Q&A sessions.
45
 
46
+ ## Model Features
47
 
48
+ This merged model combines the robust generative capabilities of Llama3-ChatQA-1.5 with the refined tuning of Llama3-8B-Chinese-Chat. It offers:
49
+ - Enhanced context understanding for both English and Chinese queries.
50
+ - Improved performance in conversational QA tasks.
51
+ - Versatile text generation capabilities across different languages.
52
 
53
+ ## Evaluation Results
54
+
55
+ The evaluation results of the parent models indicate strong performance in various benchmarks. For instance, Llama3-ChatQA-1.5 achieved notable scores in the ChatRAG Bench, demonstrating its effectiveness in conversational QA tasks. Meanwhile, Llama3-8B-Chinese-Chat has shown superior performance in Chinese language tasks, surpassing ChatGPT and matching GPT-4 in certain evaluations.
56
 
57
+ | Benchmark | Llama3-ChatQA-1.5-8B | Llama3-8B-Chinese-Chat |
58
+ |-----------|-----------------------|-------------------------|
59
+ | Doc2Dial | 41.26 | N/A |
60
+ | QuAC | 38.82 | N/A |
61
+ | CoQA | 78.44 | N/A |
62
+ | Average | 58.25 | N/A |
63
 
64
+ ## Limitations
 
 
65
 
66
+ While the merged model benefits from the strengths of both parent models, it may also inherit some limitations. For instance, biases present in the training data of either model could affect the responses generated. Additionally, the model may struggle with highly specialized or niche topics that were not well-represented in the training datasets. Users should be aware of these potential biases and limitations when deploying the model in real-world applications.