baichuan-inc
/

Baichuan2-13B-Chat-4bits

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

s-JoL commited on Sep 6, 2023

Commit

c16beac

•

1 Parent(s): c359abf

Update modeling_baichuan.py

Files changed (1) hide show

modeling_baichuan.py +1 -1

modeling_baichuan.py CHANGED Viewed

@@ -177,7 +177,7 @@ class BaichuanAttention(torch.nn.Module):
             key_states = key_states.transpose(1, 2)
             value_states = value_states.transpose(1, 2)
             attn_output = xops.memory_efficient_attention(
-                query_states, key_states, value_states, attn_bias=xops.LowerTriangularMask()
             )
         else:
             attn_weights = torch.matmul(

             key_states = key_states.transpose(1, 2)
             value_states = value_states.transpose(1, 2)
             attn_output = xops.memory_efficient_attention(
+                query_states, key_states, value_states, attn_bias=attention_mask.unsqueeze(0).expand(bsz, -1, -1, -1)
             )
         else:
             attn_weights = torch.matmul(