Text Generation
Transformers
Safetensors
English
mixtral
Generated from Trainer
axolotl
conversational
Inference Endpoints
text-generation-inference

dolphin-2.9.2-codestral-22B

#2
by Apel-sin - opened

Thanx for your work!
Have u any plans on codestral-22B? :)

Cognitive Computations org

Never say never - but that license is a bit too restrictive for us to devote GPU resources to currently.

Cognitive Computations org

yeah is only possible buying a license to finetune

Cognitive Computations org

Kind of sad that the license is like that...

Kind of sad that the license is like that...

Yes but I would argue it's a bigger loss for them than for the community with that specific model. It's unimpressive and underwhelming for the most part, and I can't see it gaining traction when it's available to use only in 1 place essentially.

Cognitive Computations org

I'm no one to judge but code_qwen_7b is really good compared to codestral

I'm no one to judge but code_qwen_7b is really good compared to codestral

if you look at MMLU score, that model performs horribly... More like IRL it's nowhere near Codestral I would say.

I'm no one to judge but code_qwen_7b is really good compared to codestral

if you look at MMLU score, that model performs horribly... More like IRL it's nowhere near Codestral I would say.

How relevant is MMLU full suite on code completion models? Fill in the middle tasks?

I'm no one to judge but code_qwen_7b is really good compared to codestral

if you look at MMLU score, that model performs horribly... More like IRL it's nowhere near Codestral I would say.

How relevant is MMLU full suite on code completion models? Fill in the middle tasks?

chat ≠ code completion.

many people are asking for feedback on code or for ideas given their code etc. And good reasoning capabilities are useful there at the very least, it needs to understand the task at hand and understand it well. Plus I think it's a good indicator of how model performs generally as well. Not hard to make it perform good on HumanEval with dataset contamination otherwise.

I'm no one to judge but code_qwen_7b is really good compared to codestral

if you look at MMLU score, that model performs horribly... More like IRL it's nowhere near Codestral I would say.

How relevant is MMLU full suite on code completion models? Fill in the middle tasks?

chat ≠ code completion.

many people are asking for feedback on code or for ideas given their code etc. And good reasoning capabilities are useful there at the very least, it needs to understand the task at hand and understand it well. Plus I think it's a good indicator of how model performs generally as well. Not hard to make it perform good on HumanEval with dataset contamination otherwise.

You make a solid point as I did rather hyperfixate on argentic workflows but I don't think MMLU is good at code eval if you look at the datasets/questions. But otherwise you are spot on that most people don't use these models with agents without chat instruct training.

Sign up or log in to comment