abacusai
/

Smaug-Llama-3-70B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Smaug-Llama-3-70B-Instruct / README.md

ddh0's picture

Use correct license - llama3 instead of llama2

d0b3edc verified 6 months ago

|

3.41 kB

	---
	library_name: transformers
	license: llama3
	datasets:
	- aqua_rat
	- microsoft/orca-math-word-problems-200k
	- m-a-p/CodeFeedback-Filtered-Instruction
	---

	# Smaug-Llama-3-70B-Instruct

	### Built with Meta Llama 3


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/ZxYuHKmU_AtuEJbGtuEBC.png)

	This model was built using a new Smaug recipe for improving performance on real world multi-turn conversations applied to
	[meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct).

	The model outperforms Llama-3-70B-Instruct substantially, and is on par with GPT-4-Turbo, on MT-Bench (see below).

	EDIT: Smaug-Llama-3-70B-Instruct is the top open source model on Arena-Hard currently! It is also nearly on par with Claude Opus - see below.

	We are conducting additional benchmark evaluations and will add those when available.

	### Model Description

	- Developed by: [Abacus.AI](https://abacus.ai)
	- License: https://llama.meta.com/llama3/license/
	- Finetuned from model: [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct).


	## Evaluation

	### Arena-Hard

	Score vs selected others (sourced from: (https://lmsys.org/blog/2024-04-19-arena-hard/#full-leaderboard-with-gpt-4-turbo-as-judge))

	\| Model \| Score \| 95% Confidence Interval \| Average Tokens \|
	\| :---- \| ---------: \| ----------: \| ------: \|
	\| GPT-4-Turbo-2024-04-09 \| 82.6 \| (-1.8, 1.6) \| 662 \|
	\| Claude-3-Opus-20240229 \| 60.4 \| (-3.3, 2.4) \| 541 \|
	\| Smaug-Llama-3-70B-Instruct \| 56.7 \| (-2.2, 2.6) \| 661 \|
	\| GPT-4-0314 \| 50.0 \| (-0.0, 0.0) \| 423 \|
	\| Claude-3-Sonnet-20240229 \| 46.8 \| (-2.1, 2.2) \| 552 \|
	\| Llama-3-70B-Instruct \| 41.1 \| (-2.5, 2.4) \| 583 \|
	\| GPT-4-0613 \| 37.9 \| (-2.2, 2.0) \| 354 \|
	\| Mistral-Large-2402 \| 37.7 \| (-1.9, 2.6) \| 400 \|
	\| Mixtral-8x22B-Instruct-v0.1 \| 36.4 \| (-2.7, 2.9) \| 430 \|
	\| Qwen1.5-72B-Chat \| 36.1 \| (-2.5, 2.2) \| 474 \|
	\| Command-R-Plus \| 33.1 \| (-2.1, 2.2) \| 541 \|
	\| Mistral-Medium \| 31.9 \| (-2.3, 2.4) \| 485 \|
	\| GPT-3.5-Turbo-0613 \| 24.8 \| (-1.6, 2.0) \| 401 \|

	### MT-Bench

	```
	########## First turn ##########
	score
	model turn
	Smaug-Llama-3-70B-Instruct 1 9.40000
	GPT-4-Turbo 1 9.37500
	Meta-Llama-3-70B-Instruct 1 9.21250
	########## Second turn ##########
	score
	model turn
	Smaug-Llama-3-70B-Instruct 2 9.0125
	GPT-4-Turbo 2 9.0000
	Meta-Llama-3-70B-Instruct 2 8.8000
	########## Average ##########
	score
	model
	Smaug-Llama-3-70B-Instruct 9.206250
	GPT-4-Turbo 9.187500
	Meta-Llama-3-70B-Instruct 9.006250
	```

	\| Model \| First turn \| Second Turn \| Average \|
	\| :---- \| ---------: \| ----------: \| ------: \|
	\| Smaug-Llama-3-70B-Instruct \| 9.40 \| 9.01 \| 9.21 \|
	\| GPT-4-Turbo \| 9.38 \| 9.00 \| 9.19 \|
	\| Meta-Llama-3-70B-Instruct \| 9.21 \| 8.80 \| 9.01 \|

	This version of Smaug uses new techniques and new data compared to [Smaug-72B](https://huggingface.co/abacusai/Smaug-72B-v0.1), and more information will be released later on. For now, see the previous Smaug paper: https://arxiv.org/abs/2402.13228.