Severian
/

Jamba-Hercules

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Jamba-Hercules / README.md

Severian's picture

Update README.md

c5e558a verified 8 months ago

|

2.97 kB

	---
	license: apache-2.0
	tags:
	- jamba
	datasets:
	- teknium/OpenHermes-2.5
	base_model: ai21labs/Jamba-v0.1
	pipeline_tag: text-generation
	---

	# Jamba-Open-Hermes

	<img src="https://cdn-uploads.huggingface.co/production/uploads/64740cf7485a7c8e1bd51ac9/Ph6ZvxwF7a0m_B5Su_EK7.webp" width="500" height="500">

	# This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all best working iterations, even if they are a bit wonky, will be pushed here

	There's been limited testing so example outputs yet

	---
	## Training


	### Open-Hermes-2.0 (Only first 1500 examples): [ 1530/125193 4:46:45 < 386:48:08, 0.09 it/s, Epoch 0.01/1]

	```
	1483 5.986700
	1484 5.764100
	1485 5.887200
	1486 5.445200
	1487 6.086300
	1488 5.718300
	1489 5.670300
	1490 5.440900
	1491 4.945900
	1492 6.154700
	1493 5.624800
	1494 6.868100
	1495 5.627100
	1496 5.192700
	1497 5.826800
	1498 5.512200
	1499 5.869900
	1500 5.852300
	1501 5.574800
	1502 5.299200
	1503 5.631200
	1504 5.535600
	1505 5.626000
	1506 5.093300
	1507 5.278000
	1508 5.585400
	1509 5.318600
	1510 5.319200
	1511 5.513900
	1512 5.375400
	1513 5.460600
	1514 5.045300
	1515 6.013600
	1516 5.812300
	1517 5.707400
	1518 5.109800
	1519 5.212900
	1520 5.317200
	1521 5.935400
	1522 5.733900
	1523 5.866000
	1524 5.675400
	1525 5.580800
	1526 4.996900
	1527 5.666700
	1528 4.979900
	```

	### Hyperparameters

	```py
	from trl import SFTTrainer
	import torch
	from peft import LoraConfig
	from transformers import AutoTokenizer, TrainingArguments
	from transformers import BitsAndBytesConfig
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Initialize or load your tokenizer and model here
	tokenizer = AutoTokenizer.from_pretrained("ai21labs/Jamba-v0.1")
	tokenizer.padding_side = 'right'
	tokenizer.padding_side = 'left'

	max_seq_length = 4096

	lora_config = LoraConfig(
	r=8,
	lora_alpha=16,
	target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"],
	lora_dropout=0.2,
	task_type="CAUSAL_LM",
	bias="none"
	)

	trainer = SFTTrainer(
	model=model,
	train_dataset=train_dataset,
	dataset_text_field="text",
	max_seq_length=max_seq_length,
	tokenizer=tokenizer,
	args=TrainingArguments(
	num_train_epochs=1,
	lr_scheduler_type='linear',
	learning_rate=2e-5,
	per_device_train_batch_size=1,
	gradient_accumulation_steps=8,
	gradient_checkpointing=True,
	warmup_steps=10,
	weight_decay=0.2,
	fp16=not torch.cuda.is_bf16_supported(),
	bf16=torch.cuda.is_bf16_supported(),
	logging_steps=1,
	save_steps=100,
	output_dir="outputs",
	optim="paged_adamw_8bit",
	seed=42,
	),
	)

	# Set environment variables for PyTorch memory management
	import os
	os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128,expandable_segments:True"
	```