Crystalcareai commited on
Commit
0f236b2
1 Parent(s): 3147e3e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This is an MOE of Llama-3-8b with 4 experts. This does not use semantic routing, as this utilizes the deepseek-moe architecture. There is no routing, and there is no gate - all experts are active on every token.
2
+
3
+ ```import torch
4
+ from transformers import AutoTokenizer, TextStreamer, AutoModelForCausalLM
5
+
6
+ model_path = "./content"
7
+ model = AutoModelForCausalLM.from_pretrained(
8
+ model_path,
9
+ device_map="auto",
10
+ low_cpu_mem_usage=True,
11
+ torch_dtype=torch.bfloat16,
12
+ trust_remote_code=True,
13
+ attn_implementation="flash_attention_2",
14
+ )
15
+
16
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
17
+ streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
18
+
19
+ # Modify the prompt to match the Alpaca instruction template
20
+ prompt = """
21
+ Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
22
+
23
+ ### Instruction:
24
+ Sam is faster than Joe. Joe is faster than Jane. Is Sam faster than Jane? Explain your reasoning step by step.
25
+
26
+ ### Input:
27
+
28
+ ### Response:
29
+ """
30
+
31
+ tokens = tokenizer(
32
+ prompt,
33
+ return_tensors='pt'
34
+ ).input_ids.cuda()
35
+
36
+ generation_output = model.generate(
37
+ tokens,
38
+ streamer=streamer,
39
+ max_new_tokens=512,
40
+ )
41
+ ```