twhoool02
/

Llama-2-7b-hf-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

twhoool02 commited on Mar 3

Commit

5057a02

•

1 Parent(s): 35b5b0e

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +1 -26

README.md CHANGED Viewed

@@ -18,32 +18,7 @@ pipeline_tag: text-generation
 qunatized_by: twhoool02
 ---
-# Model Card for LlamaAWQForCausalLM(
-  (model): LlamaForCausalLM(
-    (model): LlamaLikeModel(
-      (embedding): Embedding(32000, 4096)
-      (blocks): ModuleList(
-        (0-31): 32 x LlamaLikeBlock(
-          (norm_1): FasterTransformerRMSNorm()
-          (attn): QuantAttentionFused(
-            (qkv_proj): WQLinear_GEMM(in_features=4096, out_features=12288, bias=False, w_bit=4, group_size=128)
-            (o_proj): WQLinear_GEMM(in_features=4096, out_features=4096, bias=False, w_bit=4, group_size=128)
-            (rope): RoPE()
-          )
-          (norm_2): FasterTransformerRMSNorm()
-          (mlp): LlamaMLP(
-            (gate_proj): WQLinear_GEMM(in_features=4096, out_features=11008, bias=False, w_bit=4, group_size=128)
-            (up_proj): WQLinear_GEMM(in_features=4096, out_features=11008, bias=False, w_bit=4, group_size=128)
-            (down_proj): WQLinear_GEMM(in_features=11008, out_features=4096, bias=False, w_bit=4, group_size=128)
-            (act_fn): SiLU()
-          )
-        )
-      )
-      (norm): LlamaRMSNorm()
-    )
-    (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
-  )
-)
 <!-- Provide a quick summary of what the model is/does. -->

 qunatized_by: twhoool02
 ---
+# Model Card for Llama-2-7b-hf-AWQ
 <!-- Provide a quick summary of what the model is/does. -->