Tess-M-v1.3 / README.md

Adding the Open Portuguese LLM Leaderboard Evaluation Results

5694a13 verified 2 months ago

5.91 kB

	---
	license: other
	license_name: yi-34b
	license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
	model-index:
	- name: Tess-M-v1.3
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ENEM Challenge (No Images)
	type: eduagarcia/enem_challenge
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 72.36
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=migtissera/Tess-M-v1.3
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BLUEX (No Images)
	type: eduagarcia-temp/BLUEX_without_images
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 64.81
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=migtissera/Tess-M-v1.3
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: OAB Exams
	type: eduagarcia/oab_exams
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 55.58
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=migtissera/Tess-M-v1.3
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 RTE
	type: assin2
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 91.46
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=migtissera/Tess-M-v1.3
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 STS
	type: eduagarcia/portuguese_benchmark
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: pearson
	value: 78.33
	name: pearson
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=migtissera/Tess-M-v1.3
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: FaQuAD NLI
	type: ruanchaves/faquad-nli
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 80.55
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=migtissera/Tess-M-v1.3
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HateBR Binary
	type: ruanchaves/hatebr
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 73.97
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=migtissera/Tess-M-v1.3
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: PT Hate Speech Binary
	type: hate_speech_portuguese
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 66.63
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=migtissera/Tess-M-v1.3
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: tweetSentBR
	type: eduagarcia/tweetsentbr_fewshot
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 73.99
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=migtissera/Tess-M-v1.3
	name: Open Portuguese LLM Leaderboard
	---

	# Note:
	This version is the stable release. The issues that were present in versions 1.0, 1.1 and 1.2 all have been rectified. Thank you for your patience while R&D was conducted. Enjoy!

	This model have been tested on very long context length. Model produced slight repetition, but it was very minor. I recommend testing the model to your usecase and limiting the context length. Here's my conversation: https://migel.substack.com/p/testing-tess-m-v13

	As can be seen, "USER:" and "SYSTEM: Answer the question thoughtfully and intelligently. Always answer without hesitation." was presented by the model in the latter part of the conversation.

	# Learnings:
	Here's my learnings going from Tess-v1.0 to Tess-v1.3: https://migel.substack.com/p/learnings-from-training-tess

	# Tess

	![Tess](https://huggingface.co/migtissera/Tess-M-v1.0/resolve/main/Tess.png)

	Tess, short for Tesoro (Treasure in Italian), is a general purpose Large Language Model series. Tess-M-v1.3 was trained on the Yi-34B-200K base.


	# Prompt Format:

	```
	SYSTEM: <ANY SYSTEM CONTEXT>
	USER:
	ASSISTANT:
	```


	# Open Portuguese LLM Leaderboard Evaluation Results

	Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/migtissera/Tess-M-v1.3) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)

	\| Metric \| Value \|
	\|--------------------------\|---------\|
	\|Average \|73.08\|
	\|ENEM Challenge (No Images)\| 72.36\|
	\|BLUEX (No Images) \| 64.81\|
	\|OAB Exams \| 55.58\|
	\|Assin2 RTE \| 91.46\|
	\|Assin2 STS \| 78.33\|
	\|FaQuAD NLI \| 80.55\|
	\|HateBR Binary \| 73.97\|
	\|PT Hate Speech Binary \| 66.63\|
	\|tweetSentBR \| 73.99\|