armaGPT / README.md
sidharthsajith7's picture
Update README.md
5f803d6 verified
|
raw
history blame
1.71 kB
---
license: mit
datasets:
- yahma/alpaca-cleaned
- HuggingFaceH4/ultrafeedback_binarized
language:
- en
pipeline_tag: question-answering
library_name: transformers
---
Model Description: armaGPT is a finetuned version of Gemma 7b, a pre-trained language model developed by Google. It is designed to generate human-like text based on the input it receives. And armaGPT is finetuned using DPO Training for fair and safe generation.
Model Architecture: The architecture of armaGPT is based on the transformer model, which is a type of recurrent neural network (RNN) that uses self-attention mechanisms to process input sequences.
Model Size: The model has approximately 7 billion parameters.
### Context Length
Models are trained on a context length of 8192 tokens.
#### Running the model on a CPU
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("sidharthsajith7/armaGPT")
model = AutoModelForCausalLM.from_pretrained("sidharthsajith7/armaGPT")
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```
#### Running the model on a single / multi GPU
```python
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("sidharthsajith7/armaGPT")
model = AutoModelForCausalLM.from_pretrained("sidharthsajith7/armaGPT", device_map="auto")
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
```