convert nemo-megatron-mt5-3B to a binary file for triton-with-fastertransformer successfully, but tritonserver fails with undesired tensor shape
Hi, could you please give some advice for this issue?
I have converted the nemo-megatron-mt5-3B to a binary file successfully by
python3 FasterTransformer/examples/pytorch/t5/utils/nemo_t5_ckpt_convert.py -i nemo-megatron-mt5-3B/nemo_megatron_mt5_3b_bf16_tp2.nemo -o ./models/nemo-megatron-mt5-3B/ -m mt5-3B -i_g 2
When run a tritonserver with CUDA_VISIBLE_DEVICES="0,1" /opt/tritonserver/bin/tritonserver --model-store=fastertransformer_backend/all_models/nemo-megatron-mt5-3B/, tritonserver failed to loading the model with unmatched shape.
'''
I0414 15:43:13.619001 934 libfastertransformer.cc:438] Before Loading Weights:
after allocation : free: 14.14 GB, total: 44.56 GB, used: 30.43 GB
[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//decoder.final_layer_norm.bias.bin only has 4096, but request 8192, loading model fails!
[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//shared.bias.bin only has 500224, but request 1000448, loading model fails!
[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//decoder.final_layer_norm.bias.bin only has 4096, but request 8192, loading model fails!
[FT][WARNING] file ./models/nemo-megatron-mt5-3B/2-gpu//shared.bias.bin only has 500224, but request 1000448, loading model fails!
I0414 15:43:21.362566 934 libfastertransformer.cc:448] After Loading Weights:
'''
Solved the problem by https://github.com/NVIDIA/FasterTransformer/issues/561.