PEFT
Safetensors
Japanese
English
h-iida's picture
Update README.md
9aa453b verified
|
raw
history blame
2.1 kB
metadata
base_model:
  - meta-llama/Llama-2-7b-hf
library_name: peft
license: apache-2.0
datasets:
  - wikimedia/wikipedia
language:
  - ja
  - en

Model Info

This is a model that applies LLM2Vec to Swallow. Only the PEFT Adapter is distributed. LLM2Vec fine-tunes on two tasks: MNTP and SimCSE, but this repository contains the results of applying only the MNTP task.

Model Details

Model Description

  • Model type: PEFT
  • Language(s) (NLP): Japanese
  • License: Apache2.0
  • Finetuned from model: llama-2-7b-hf

Model Sources [optional]

Usage

Training Details

Training Data

Training Hyperparameter

  • batch_size: 64,
  • gradient_accumulation_steps: 1
  • max_seq_length": 512,
  • mask_token_type: "blank"
  • mlm_probability: 0.2
  • lora_r: 16
  • torch_dtype "bfloat16"
  • attn_implementation "flash_attention_2"
  • bf16: true
  • gradient_checkpointing: true

Accelerator Settings

  • deepspeed_config:
    • gradient_accumulation_steps: 1
    • gradient_clipping: 1.0
    • offload_optimizer_device: nvme
    • offload_optimizer_nvme_path: /nvme
    • zero3_save_16bit_model: true
    • zero_stage: 2
  • distributed_type: DEEPSPEED
  • downcast_bf16: 'no'
  • dynamo_config:
    • dynamo_backend: INDUCTOR
    • dynamo_mode: default
    • dynamo_use_dynamic: true
    • dynamo_use_fullgraph: true
  • enable_cpu_affinity: false
  • machine_rank: 0
  • main_training_function: main
  • mixed_precision: bf16
  • num_machines: 1
  • num_processes: 2
  • rdzv_backend: static
  • same_network: true
  • quse_cpu: false

Framework versions

  • Python: 3.12.3
  • PEFT 0.11.1
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.0
  • PyTorch: 2.3.0
  • Accelerate: 0.30.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1
  • MTEB: 1.13.0