How do I try this out?
I tried to deploy it using gradle but it is infinitely loading and doesn't seem to work, neither does the other gradle endpoints that other people have made.
I want to host it in an (huggingface) inference api preferably, which I managed to get working for other models but I get an error when trying to run this.
I think this is the most relevant part of the error:
tokenizer = LlamaTokenizerFast.from_pretrained(\n\n File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained\n return cls._from_pretrained(\n\n File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1886, in _from_pretrained\n slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(\n\n File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2073, in _from_pretrained\n raise ValueError(\n\nValueError: Non-consecutive added token '' found. Should have index 32000 but has index 0 in saved vocabulary.\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
2023/10/15 12:24:56 ~ Error: ShardCannotStart
It said "Non-consecutive added token ' < u n k > ' found" but it seems like html escaping removed it.
@henke443 This should give you some guidance https://github.com/huggingface/text-generation-inference/issues/1132
Remove these lines from added_tokens.json
"</s>": 2,
"<s>": 1,
"<unk>": 0,
The link above says to delete the file but it is important for the chatml format