zhs12 commited on
Commit
a0cf305
1 Parent(s): 4f8f0a1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - LargeWorldModel/ultrachat_qa_mix_512K
5
+ language:
6
+ - en
7
+ - zh
8
+ ---
9
+ # Model Card for llama3-8B-360Zhinao-360k-Instruct
10
+
11
+ llama3-8B-360Zhinao-360k-Instruct is 360Zhinao's extension of llama3-8B-Instruct to a 360k context window.
12
+
13
+ Within the 360k-token length,
14
+ llama3-8B-360Zhinao-360k-Instruct achieves:
15
+
16
+ - **100%** perfect recall on the "value retrieval" variant of NIAH (Needle-In-A-Haystack), which requires the model to retrieve the number in the inserted needle "The special magic {random city} number is {random 7-digit number}".
17
+
18
+ - **99.75%** near-perfect recall on the [original NIAH](https://github.com/gkamradt/LLMTest_NeedleInAHaystack) and its corresponding Chinese counterpart, where the needle (e.g. The best thing to do in San Francisco is...) and haystack (e.g. Paul Graham's essays which inevitably talk about San Francisco) are more relevant, hence a more difficult task.
19
+ Other models with 100% recall on value retrieval could struggle with this NIAH version.
20
+
21
+ ## 360k-NIAH (Needle-In-A-Haystack) results
22
+
23
+ ### "value retrieval" variant of NIAH
24
+ <img src="https://github.com/Qihoo360/360zhinao/blob/main/assets/llama3-8B-360Zhinao-360k-Instruct.value_score.png?raw=true" width="600" />
25
+
26
+ ### Original NIAH
27
+ <img src="https://github.com/Qihoo360/360zhinao/blob/main/assets/llama3-8B-360Zhinao-360k-Instruct.en_score.png?raw=true" width="600" />
28
+
29
+ ### Chinese NIAH
30
+ <img src="https://github.com/Qihoo360/360zhinao/blob/main/assets/llama3-8B-360Zhinao-360k-Instruct.zh_score.png?raw=true" width="600" />
31
+
32
+ ### Remarks
33
+
34
+ We found that [the "value retrieval" variant of NIAH](https://github.com/Arize-ai/LLMTest_NeedleInAHaystack) (widely used recently in e.g. Gemini, LWM and gradient.ai) is relatively easy.
35
+ 100% all-green results on value retrieval doesn't necessarily mean near-perfect results on more difficult NIAH tasks, as demonstrated by this [original-version NIAH](https://github.com/gkamradt/LLMTest_NeedleInAHaystack) result of one open-sourced llama3-8B-262k model:
36
+ <img src="https://github.com/Qihoo360/360zhinao/blob/main/assets/open-262k.en_score.png?raw=true" width="600" />
37
+
38
+ This model does achieve 100% all-green results on value retrieval but less than satisfactory results on the original version.
39
+
40
+
41
+ ## Usage
42
+
43
+ llama3-8B-360Zhinao-360k-Instruct could be launched with [vllm](https://github.com/vllm-project/vllm).
44
+ To perform inference on 360k-token inputs, we used a 8 x 80G machine (A800).
45
+
46
+ ```shell
47
+ model_path=${1}
48
+
49
+ export ENV_PORT=7083
50
+ export ENV_TP=8
51
+ export ENV_MODEL_PATH=$model_path
52
+ echo ${ENV_MODEL_PATH}
53
+ export ENV_MAX_MODEL_LEN=365000
54
+ export ENV_MAX_BATCH_TOKENS=365000
55
+ export ENV_GPU_MEMORY_UTIL=0.6
56
+
57
+ export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256
58
+ python -m vllm.entrypoints.openai.api_server \
59
+ --model "${ENV_MODEL_PATH:-/workspace/model}" \
60
+ --tensor-parallel-size "${ENV_TP:-2}" \
61
+ --trust-remote-code \
62
+ --port "${ENV_PORT:-8002}" \
63
+ --gpu-memory-utilization "${ENV_GPU_MEMORY_UTIL:-0.92}" \
64
+ --max-num-batched-tokens "${ENV_MAX_BATCH_TOKENS:-18000}" \
65
+ --max-model-len "${ENV_MAX_MODEL_LEN:-4096}" \
66
+ --max-num-seqs "${ENV_MAX_NUM_SEQS:-32}" \
67
+ --enforce-eager \
68
+ > log8.server 2>&1
69
+ ```
70
+
71
+ <!-- NIAH scripts -->
72
+
73
+
74
+ ## Methods
75
+
76
+ llama3-8B-360Zhinao-360k-Instruct is trained from [llama3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
77
+ Its original context-length is 8k with RoPE base 500,000.
78
+
79
+ We directly extended its context length to 360k. We changed RoPE base to 500,000,000 and trained on a combined SFT dataset of [LWM's open-sourced data](https://huggingface.co/LargeWorldModel) and internal long-context data in Chinese and English.
80
+ We implemented SFT on top of [EasyContext](https://github.com/jzhang38/EasyContext/) but later found that turning on pretraining loss produced much better results.
81
+
82
+ ## Contact & License
83
84
+
85
+ The source code of this repository follows the open-source license Apache 2.0.
86
+ This project is built on other open-source projects, including llama3, LWM and EasyContext, whose original licenses should also be followed by users.