metadata
library_name: transformers
tags: []
CapyLake-7B-v2-laser
This model is a finetune of cognitivecomputations/WestLake-7B-v2-Laser on argilla/distilabel-capybara-dpo-7k-binarized
Process
- Realigned the chat template to ChatML
- Completed 1 Epoch
- 5e-05 learning rate
- Training time was about 2 hours on 1 H100
- Cost was ~$8
Code Example
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "macadeliccc/CapyLake-7B-v2-laser"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
text = "Create an idea for a TV show and write a short pilot script"
inputs = tokenizer(text, return_tensors="pt")
# Adding hyperparameters to the generation call
outputs = model.generate(
**inputs,
max_new_tokens=4096, # Controls the maximum length of the new tokens created
temperature=0.7, # Adjust for creativity (lower is less random)
top_k=50, # Keeps the top k tokens for sampling
top_p=0.95, # Uses nucleus sampling with this cumulative probability
num_return_sequences=1, # Number of sequences to generate
no_repeat_ngram_size=2, # Prevents repeating n-grams to ensure diversity
early_stopping=True # Stops generation when all sequences reach the EOS token
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Other Capy Models
SOLAR-10.7B-Capy-v1.0 is also on the way. There could be more depending on performance!
Evaluations
TODO