13 1 106

Wolf While

snoopynoob

[email protected]

AI & ML interests

None yet

Recent Activity

liked a model 6 days ago

nbeerbower/Lyra4-Gutenberg-12B

liked a model 27 days ago

VongolaChouko/Starcannon-Unleashed-12B-v1.0

liked a model about 1 month ago

mradermacher/UnslopNemo-12B-v4.1-GGUF

View all activity

Organizations

None yet

snoopynoob's activity

liked a model 6 days ago

nbeerbower/Lyra4-Gutenberg-12B

Text Generation • Updated Sep 14 • 290 • 19

liked a model 27 days ago

VongolaChouko/Starcannon-Unleashed-12B-v1.0

Text Generation • Updated 26 days ago • 1.12k • 31

liked 2 models about 1 month ago

mradermacher/UnslopNemo-12B-v4.1-GGUF

Updated about 1 month ago • 705 • 2

mradermacher/UnslopNemo-12B-v4.1-i1-GGUF

Updated about 1 month ago • 3.32k • 3

New activity in mradermacher/model_requests about 1 month ago

TheDrummer/UnslopNemo-12B-v4.1

#401 opened about 1 month ago by

snoopynoob

liked 4 models about 1 month ago

liked a model about 2 months ago

Lewdiculous/MN-BackyardAI-Party-12B-v1-GGUF-IQ-ARM-Imatrix

Updated Oct 3 • 876 • 14

liked 2 models 2 months ago

nbeerbower/mistral-nemo-gutenberg-12B-v4

Text Generation • Updated Sep 6 • 220 • 15

nbeerbower/Mistral-Nemo-Gutenberg-Doppel-12B

Text Generation • Updated Sep 26 • 16 • 3

Reacted to m-ric's post with 🔥 2 months ago

Post

3373

🔥 𝐐𝐰𝐞𝐧 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐭𝐡𝐞𝐢𝐫 𝟐.𝟓 𝐟𝐚𝐦𝐢𝐥𝐲 𝐨𝐟 𝐦𝐨𝐝𝐞𝐥𝐬: 𝐍𝐞𝐰 𝐒𝐎𝐓𝐀 𝐟𝐨𝐫 𝐚𝐥𝐥 𝐬𝐢𝐳𝐞𝐬 𝐮𝐩 𝐭𝐨 𝟕𝟐𝐁!

The Chinese LLM maker just dropped a flurry of different models, ensuring there will be a Qwen SOTA model for every application out there:
Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
Qwen2.5-Math: 1.5B, 7B, and 72B.

And they didn't sleep: the performance is top of the game for each weight category!

𝐊𝐞𝐲 𝐢𝐧𝐬𝐢𝐠𝐡𝐭𝐬:

🌐 All models have 𝟭𝟮𝟴𝗸 𝘁𝗼𝗸𝗲𝗻 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵

📚 Models pre-trained on 18T tokens, even longer than the 15T of Llama-3

💪 The flagship 𝗤𝘄𝗲𝗻𝟮.𝟱-𝟳𝟮𝗕 𝗶𝘀 ~𝗰𝗼𝗺𝗽𝗲𝘁𝗶𝘁𝗶𝘃𝗲 𝘄𝗶𝘁𝗵 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟰𝟬𝟱𝗕, 𝗮𝗻𝗱 𝗵𝗮𝘀 𝗮 𝟯-𝟱% 𝗺𝗮𝗿𝗴𝗶𝗻 𝗼𝗻 𝗟𝗹𝗮𝗺𝗮-𝟯.𝟭-𝟳𝟬𝗕 𝗼𝗻 𝗺𝗼𝘀𝘁 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀.

🇫🇷 On top of this, it 𝘁𝗮𝗸𝗲𝘀 𝘁𝗵𝗲 #𝟭 𝘀𝗽𝗼𝘁 𝗼𝗻 𝗺𝘂𝗹𝘁𝗶𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝘁𝗮𝘀𝗸𝘀 so it might become my standard for French

💻 Qwen2.5-Coder is only 7B but beats competing models up to 33B (DeeSeek-Coder 33B-Instruct). Let's wait for their 32B to come out!

🧮 Qwen2.5-Math sets a new high in the ratio of MATH benchmark score to # of parameters. They trained it by "aggregating more high-quality mathematical data, particularly in Chinese, from web sources, books, and codes across multiple recall cycles."

📄 Technical report to be released "very soon"

🔓 All models have the most permissive license apache2.0, except the 72B models that have a custom license mentioning "you can use it for free EXCEPT if your product has over 100M users"

🤗 All models are available on the HF Hub! ➡️ Qwen/qwen25-66e81a666513e518adb90d9e