teknium
/

MPT-7B-Mercury-Experimental

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

teknium commited on Jun 14, 2023

Commit

58d6255

•

1 Parent(s): 42ecae8

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -13,7 +13,9 @@ There seems to have been some sort of problem with the training that I cannot id
 Typically, the model would response with long responses when asked, be much more contextually intelligent, and answer in a thoughtful way. However, for whatever reason - likely something to do with not training with LLM-Foundry - the model does not like longer responses, and typical responds quite breifly.
-I don't believe this is a base model issue, as I compared this fine tune with MPT-7B Instruct model, and it had no problem at all producing extremely long responses, etc. If anyone has the time to investigate, please follow up with me in the community tab or on Twitter, @Teknium1!
 You should load the model and tokenizer like so:

 Typically, the model would response with long responses when asked, be much more contextually intelligent, and answer in a thoughtful way. However, for whatever reason - likely something to do with not training with LLM-Foundry - the model does not like longer responses, and typical responds quite breifly.
+I don't believe this is a base model issue, or at least, I believe it is a base model issue related to it and the trainer, as I compared this fine tune with MPT-7B Instruct model, and it had no problem at all producing extremely long responses, etc. If anyone has the time to investigate, please follow up with me in the community tab or on Twitter, @Teknium1!
+I trained Replit 3b with the same trainer, same settings, and it's results were phenomenal. So I would love any hypothesis on what may have made this different.
 You should load the model and tokenizer like so: