uukuguy's picture
Update README.md
c382c82
metadata
language:
  - en
library_name: transformers
pipeline_tag: text-generation
datasets:
  - jondurbin/airoboros-2.2
  - Open-Orca/OpenOrca
  - garage-bAInd/Open-Platypus
  - WizardLM/WizardLM_evol_instruct_V2_196k
  - TokenBender/python_eval_instruct_51k
tags:
  - llama-2
  - code
license: llama2
model-index:
  - name: SpeechlessCoder
    results:
      - task:
          type: text-generation
        dataset:
          type: openai_humaneval
          name: HumanEval
        metrics:
          - name: pass@1
            type: pass@1
            value: 52.439
            verified: false

speechless-coding-7b-16k-tora

Use the following dataset to fine-tune llm_agents/tora-code-7b-v1.0 in order to improve the model's reasoning and planning abilities.

context window length: 16,384 prompt_type = "alpaca" max_tokens > 128 && < 16384

Total 177,333 samples 316 MB

  • jondurbin/airoboros-2.2: Filter categories related to coding, reasoning and planning. 21,923 samples.
  • Open-Orca/OpenOrca: Filter the 'cot' category in 1M GPT4 dataset. 62,973 samples.
  • garage-bAInd/Open-Platypus: 100%, 22,760 samples.
  • WizardLM/WizardLM_evol_instruct_V2_196k: Coding coversation part. 30,081 samples
  • TokenBender/python_eval_instruct_51k: “python” in output .39,596 samples

50 samples/T=0.2/MaxTokens=512/Top_P=0.95

Code: https://github.com/uukuguy/speechless

How to Prompt the Model

This model accepts the Alpaca instruction format.

For example:

You are an intelligent programming assistant.

### Instruction:
Implement a linked list in C++

### Response:

HumanEval

Metric Value
humaneval-python 52.44

Big Code Models Leaderboard

CodeLlama-34B-Python: 53.29

CodeLlama-34B-Instruct: 50.79

CodeLlama-13B-Instruct: 50.6

CodeLlama-34B: 45.11

CodeLlama-13B-Python: 42.89

CodeLlama-13B: 35.07

MultiPL-E

Metric Value
python 55.96
java 37.84
javascript 46.93
cpp 37.48
rust 29.01
go 28.99
sh 12.11
julia 31.47
typescript 47.80

LMEval

Open LLM Leaderboard

Metric Value
ARC
HellaSwag
MMLU
TruthfulQA
Average

Parameters

lr 2e-4
lr_scheduler_type cosine
weight_decay 0.0
optim paged_adamw_8bit
flash_attention True
rerope False
max_new_tokens 16384
num_train_epochs 2
bits 4
lora_r 64
lora_alpha 256
lora_dropout 0.05
double_quant True
quant_type nf4
dataset_format sharegpt
mini_batch_size 2
grandient_accumulation_steps 32
bf16 True

A100-40G x 4