metadata

language:
  - th
  - en
license: apache-2.0
library_name: transformers
tags:
  - openthaigpt
  - llama
datasets:
  - kobkrit/rd-taxqa
  - iapp_wiki_qa_squad
  - Thaweewat/alpaca-cleaned-52k-th
  - Thaweewat/instruction-wild-52k-th
  - Thaweewat/databricks-dolly-15k-th
  - Thaweewat/hc3-24k-th
  - Thaweewat/gpteacher-20k-th
  - Thaweewat/onet-m6-social
  - Thaweewat/alpaca-finance-43k-th
pipeline_tag: text-generation
model-index:
  - name: openthaigpt-1.0.0-beta-7b-chat-ckpt-hf
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 44.97
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=openthaigpt/openthaigpt-1.0.0-beta-7b-chat-ckpt-hf
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 70.19
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=openthaigpt/openthaigpt-1.0.0-beta-7b-chat-ckpt-hf
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 36.22
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=openthaigpt/openthaigpt-1.0.0-beta-7b-chat-ckpt-hf
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 49.99
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=openthaigpt/openthaigpt-1.0.0-beta-7b-chat-ckpt-hf
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 69.38
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=openthaigpt/openthaigpt-1.0.0-beta-7b-chat-ckpt-hf
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 1.36
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=openthaigpt/openthaigpt-1.0.0-beta-7b-chat-ckpt-hf
          name: Open LLM Leaderboard

🇹🇭 OpenThaiGPT 1.0.0-beta

🇹🇭 OpenThaiGPT Version 1.0.0-beta is a Thai language 7B-parameter LLaMA v2 Chat model finetuned to follow Thai translated instructions and extend more than 24,500 most popular Thai words vocabularies into LLM's dictionary for turbo speed.

Upgrade from OpenThaiGPT 1.0.0-alpha

Add more than 24,500 most popular Thai words vocabularies into LLM's dictionary and re-pretrain embedding layers which make it generate Thai text 10 times faster than previous version.

Support

Official website: https://openthaigpt.aieat.or.th
Facebook page: https://web.facebook.com/groups/openthaigpt
A Discord server for discussion and support here
E-mail: [email protected]

License

Source Code: License Apache Software License 2.0.
Weight: Research and Commercial uses.

Code and Weight

Colab Demo: https://colab.research.google.com/drive/1kDQidCtY9lDpk49i7P3JjLAcJM04lawu?usp=sharing
Finetune Code: https://github.com/OpenThaiGPT/openthaigpt-finetune-010beta
Inference Code: https://github.com/OpenThaiGPT/openthaigpt
Weight (Huggingface Checkpoint): https://huggingface.co/openthaigpt/openthaigpt-1.0.0-beta-7b-chat-ckpt-hf

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	45.35
AI2 Reasoning Challenge (25-Shot)	44.97
HellaSwag (10-Shot)	70.19
MMLU (5-Shot)	36.22
TruthfulQA (0-shot)	49.99
Winogrande (5-shot)	69.38
GSM8k (5-shot)	1.36

openthaigpt
/

openthaigpt-1.0.0-beta-7b-chat-ckpt-hf

🇹🇭 OpenThaiGPT 1.0.0-beta

Upgrade from OpenThaiGPT 1.0.0-alpha

Support

License

Code and Weight

Sponsors

Powered by

Authors

Open LLM Leaderboard Evaluation Results