macadeliccc's picture
Update README.md
9d96129 verified
|
raw
history blame
2.01 kB
---
library_name: transformers
tags: []
---
# CapyLake-7B-v2-laser
This model is a finetune of [cognitivecomputations/WestLake-7B-v2-Laser](https://huggingface.co/cognitivecomputations/WestLake-7B-v2-laser) on [argilla/distilabel-capybara-dpo-7k-binarized](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized)
<div align="center">
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/6455cc8d679315e4ef16fbec/kx2uwS_kZ-rTAJiusSrAW.webp)
[<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-dark.png" alt="Built with Distilabel" width="200" height="32"/>](https://github.com/argilla-io/distilabel)
</div>
## Process
+ Realigned the chat template to ChatML
+ Completed 1 Epoch
+ 5e-05 learning rate
+ Training time was about 2 hours on 1 H100
+ Cost was ~$8
## Code Example
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "macadeliccc/CapyLake-7B-v2-laser"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
text = "Create an idea for a TV show and write a short pilot script"
inputs = tokenizer(text, return_tensors="pt")
# Adding hyperparameters to the generation call
outputs = model.generate(
**inputs,
max_new_tokens=4096, # Controls the maximum length of the new tokens created
temperature=0.7, # Adjust for creativity (lower is less random)
top_k=50, # Keeps the top k tokens for sampling
top_p=0.95, # Uses nucleus sampling with this cumulative probability
num_return_sequences=1, # Number of sequences to generate
no_repeat_ngram_size=2, # Prevents repeating n-grams to ensure diversity
early_stopping=True # Stops generation when all sequences reach the EOS token
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Other Capy Models
SOLAR-10.7B-Capy-v1.0 is also on the way. There could be more depending on performance!
## Evaluations
TODO