尝试在qwen-1_8B和qwen-7b使用assisted_decoding, 代码如下
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
LogitsProcessorList,
MinLengthLogitsProcessor,
StoppingCriteriaList,
MaxLengthCriteria,
)

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B")
assistant_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-1_8B")

set pad_token_id to eos_token_id because GPT2 does not have a PAD token

model.generation_config.pad_token_id = model.generation_config.eos_token_id
input_prompt = "It might be possible to"
input_ids = tokenizer(input_prompt, return_tensors="pt").input_ids

instantiate logits processors

logits_processor = LogitsProcessorList(
[
MinLengthLogitsProcessor(10, eos_token_id=model.generation_config.eos_token_id),
]
)
stopping_criteria = StoppingCriteriaList([MaxLengthCriteria(max_length=20)])
outputs = model.assisted_decoding(
input_ids,
assistant_model=assistant_model,
logits_processor=logits_processor,
stopping_criteria=stopping_criteria,
)
result = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(result)

requirements:
transformers==4.32.0
accelerate
tiktoken
einops
transformers_stream_generator==0.0.4
scipy
torch==1.12

报错
/Qwen-1_8B/modeling_qwen.py, line 778，in forward
input_ids = input_ids.view(-1, input_shape[-1])
RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecific dimension size -1 can be any value and is ambiguous

是否modeling_qwen文件中key, value的transpose和permute方法有问题？

Qwen
/

Qwen-1_8B

qwen-1_8B和qwen-7b辅助解码失败

set pad_token_id to eos_token_id because GPT2 does not have a PAD token

instantiate logits processors