license: llama3.1
tags:
- code
Model Overview Model Name: Llama 3.1 180M Untrained Model Size: 180M parameters Tensor Type: F32 License: MIT Model Type: Untrained Language Model Framework: PyTorch
Model Description The Llama 3.1 180M Untrained model is a lightweight, untrained language model designed to serve as a starting point for research and experimentation in natural language processing. With 180 million parameters, this model is suitable for fine-tuning on specific tasks or domains, offering a balance between model complexity and computational efficiency.
Intended Use This model is intended for research purposes and fine-tuning on specific tasks such as text classification, sentiment analysis, or other NLP tasks. As the model is untrained, it requires fine-tuning on relevant datasets to achieve desired performance.
Fine-Tuning Requirements GPU Requirements: Full Fine-Tuning: This model requires a GPU with at least 24 GB of VRAM for full fine-tuning at a sequence length of 4096. Supported GPUs: NVIDIA RTX 3090, A100, or equivalent. Training Data This model has not been trained on any data. Users are encouraged to fine-tune the model on datasets that are appropriate for their specific use case.
Evaluation As the model is untrained, it has not been evaluated on any benchmark datasets. Performance metrics should be determined after fine-tuning.
Limitations Untrained: The model is untrained and will not perform well on any task until it has been fine-tuned. Ethical Considerations: Users should be mindful of the ethical implications of deploying fine-tuned models, especially in sensitive applications.
from transformers import AutoModelForCausalLM, AutoTokenizer
Load the fine-tuned model and tokenizer
model = AutoModelForCausalLM.from_pretrained("oktrained/llama3.1_180M_untrained") tokenizer = AutoTokenizer.from_pretrained("oktrained/llama3.1_180M_untrained")
Sample input text
input_text = "Once upon a time"
Tokenize input
inputs = tokenizer(input_text, return_tensors="pt")
Generate output
output = model.generate(**inputs, max_length=50)
Decode output
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)