Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
5
John Cena
Abc7347
Follow
NikolayKozloff's profile picture
ONTHEREDTEAM's profile picture
2 followers
ยท
3 following
AI & ML interests
None yet
Recent Activity
liked
a model
26 days ago
meta-llama/Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8
Reacted to
m-ric
's
post
with โค๏ธ
about 1 month ago
๐๐ Cohere releases Aya 8B & 32B: SOTA multilingual models for 23 languages ! How did they manage to beat top contenders while also adding 23 languages? ๐ ๐ง๐ฟ๐ฎ๐ถ๐ป ๐ผ๐ป ๐๐๐ป๐๐ต๐ฒ๐๐ถ๐ฐ ๐ฑ๐ฎ๐๐ฎ: โข Synthetic data has been said to cause model-collapse after too much training โข Cohere has introduced "data arbitrage" to prevent this by strategically sampling from a pool of several teacher models instead of one single teacher โข First train a model pool for each different groups of languages, and employ an internal Reward Model named "Arbiter" to evaluate and select the optimal generation. Then only the best generation is kept as the final completion for each prompt โก๏ธ This process is particularly effective for multilingual setting, where no single teacher model performs in all languages : here "Multilingual Arbitrage" singlehandedly improves win rates of the 8B model vs Gemma-2-9B by 10 points! ๐งฉ ๐จ๐๐ฒ ๐บ๐ผ๐ฑ๐ฒ๐น ๐บ๐ฒ๐ฟ๐ด๐ถ๐ป๐ด: Rather than struggling to find the right mix of data in training a single model for multilingual use, just train language specific models then merge them! โข Maximize diversity between merged checkpoints by training each on different language families. โข Experimented fancy techniques (SLERP, TIES, DARE-TIES) but found out weighted averaging to be the most consistent! โก๏ธ Merging had 3x more gains at high 35B scale vs the 8B scale - consistent with literature findings that merging is more effective at scale โก๏ธ ๐๐ฟ๐ฒ๐ฎ๐ ๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ฎ๐ป๐ฐ๐ฒ: Automatic evaluations on Arena-Hard-Auto dataset: โก๏ธ Aya Expanse 8B beats models from its weight class such as Gemma 2 9B, Llama 3.1 8B, and the recent Ministral 8B, with win rates ranging from 60.4% to 70.6% โก๏ธ Aya Expanse 32B outperforms Gemma 2 27B, Mistral 8x22B, and Llama 3.1 70B (2x its size) โข โ ๏ธ But this performance eval comes from only one benchmark! Let's wait for Open LLM leaderboard evals; ๐ CC by NC license Blog post here: https://huggingface.co/blog/aya-expanse
updated
a model
about 1 month ago
Abc7347/Llama-3.2-3B-Overthinker-Q8_0-GGUF
View all activity
Organizations
None yet
models
5
Sort:ย Recently updated
Abc7347/Llama-3.2-3B-Overthinker-Q8_0-GGUF
Text Generation
โข
Updated
Oct 18
โข
7.15k
โข
2
Abc7347/Llama-3.1-8B-Instruct-Q8_0-GGUF
Text Generation
โข
Updated
Oct 14
โข
14
Abc7347/Llama-3.2-3B-Instruct-uncensored-Q4_K_M-GGUF
Updated
Oct 2
โข
159
โข
1
Abc7347/BlackSheep-Llama3.2-3B-Q8_0-GGUF
Updated
Oct 2
โข
3
Abc7347/BlackSheep-Llama3.2-3B-Q4_K_M-GGUF
Updated
Oct 1
โข
45
โข
3
datasets
None public yet