metadata

base_model:
  - meta-llama/Llama-2-7b-hf
library_name: peft
license: apache-2.0
datasets:
  - wikimedia/wikipedia
language:
  - ja
  - en

Model Info

This is a model that applies LLM2Vec to Swallow. Only the PEFT Adapter is distributed. LLM2Vec fine-tunes on two tasks: MNTP and SimCSE, but this repository contains the results of applying only the MNTP task.

Model Details

Model Description

Model type: PEFT
Language(s) (NLP): Japanese
License: Apache2.0
Finetuned from model: llama-2-7b-hf

Model Sources [optional]

Repository: https://github.com/McGill-NLP/llm2vec
Paper: https://arxiv.org/abs/2404.05961

Usage

Please see original LLM2Vec repo

Training Details

Training Data

Wikipedia

Training Hyperparameter

batch_size: 64,
gradient_accumulation_steps: 1
max_seq_length": 512,
mask_token_type: "blank"
mlm_probability: 0.2
lora_r: 16
torch_dtype "bfloat16"
attn_implementation "flash_attention_2"
bf16: true
gradient_checkpointing: true

Accelerator Settings

deepspeed_config:
- gradient_accumulation_steps: 1
- gradient_clipping: 1.0
- offload_optimizer_device: nvme
- offload_optimizer_nvme_path: /nvme
- zero3_save_16bit_model: true
- zero_stage: 2
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_config:
- dynamo_backend: INDUCTOR
- dynamo_mode: default
- dynamo_use_dynamic: true
- dynamo_use_fullgraph: true
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
quse_cpu: false

Framework versions

Python: 3.12.3
PEFT 0.11.1
Sentence Transformers: 3.0.1
Transformers: 4.41.0
PyTorch: 2.3.0
Accelerate: 0.30.1
Datasets: 2.20.0
Tokenizers: 0.19.1
MTEB: 1.13.0