Unusable
The model constantly spits out "<0x0A>" instead of newlines (\n), making pretty much every generation unreadable. I'm really impressed that it managed to score so high on the open LLM leaderboard despite this major flaw! Makes me wonder how.
The following screenshot is an example of the problem I mentioned.
The model constantly spits out "<0x0A>" instead of newlines (\n), making pretty much every generation unreadable. I'm really impressed that it managed to score so high on the open LLM leaderboard despite this major flaw! Makes me wonder how.
The following screenshot is an example of the problem I mentioned.
17:10:29-666622 INFO Starting Text generation web UI
17:10:29-687251 INFO Loading "zhengr_MixTAO-7Bx2-MoE-Instruct-v7.0"
17:10:29-777983 WARNING Auto-assiging --gpu-memory 14 for your GPU to try to
prevent out-of-memory errors. You can manually set
other values.
17:10:29-781085 INFO Using the following 4-bit params: {'load_in_4bit':
True, 'bnb_4bit_compute_dtype': torch.bfloat16,
'bnb_4bit_quant_type': 'nf4',
'bnb_4bit_use_double_quant': False}
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|ββββββ | 1/3 [00:41<01:23, 41.66s/it]
Loading checkpoint shards: 67%|ββββββββββββ | 2/3 [01:24<00:42, 42.41s/it]
Loading checkpoint shards: 100%|ββββββββββββββββββ| 3/3 [01:49<00:00, 34.52s/it]
Loading checkpoint shards: 100%|ββββββββββββββββββ| 3/3 [01:49<00:00, 36.58s/it]
17:12:24-060033 INFO LOADER: "Transformers"
17:12:24-061758 INFO TRUNCATION LENGTH: 32768
17:12:24-063167 INFO INSTRUCTION TEMPLATE: "Alpaca"
17:12:24-064414 INFO Loaded the model in 114.38 seconds.
Same problem for me - I imported GGUF (from https://huggingface.co/MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF) into Ollama, and I am getting <0x0A> in answers. Other models don't do that.
Might be caused by wrong / incomplete tokenizer configuration used during GGUF conversion - I've experienced similar problems with GGUFs for Nous-Hermes models, recently (https://huggingface.co/NousResearch/Nous-Hermes-2-SOLAR-10.7B/discussions/7, https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B/discussions/12).
@Trappu If you were using same GGUF as me, you can use tokenizer.model from https://huggingface.co/zhengr/MixTAO-7Bx2-MoE-v8.1 for 7.0 conversion to f16 GGUF via llama.cpp\convert-hf-to-gguf.py, and then quantize via llama.cpp\quantize (I use Q6_K) - it solved <0x0A> issues for me, when I created GGUF this way.