Update README.md

442b10a verified 5 months ago

3.66 kB

	---
	base_model: unsloth/meta-llama-3.1-8b-bnb-4bit
	language:
	- en
	license: apache-2.0
	datasets:
	- Salesforce/xlam-function-calling-60k
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- llama
	- trl
	---


	# Meta-Llama-3.1-8B-Unsloth-2x-faster-finetuning-GGUF-by-skk
	Model Description
	This model is a fine-tuned version of Meta-Llama-3.1-8B, optimized for faster inference and efficient model adaptation. Fine-tuning was performed using Unsloth, Low-Rank Adaptation (LoRA), and 4-bit quantization. The model is designed to provide enhanced, context-aware, and relevant interactions for various applications.


	Developed by: Shailesh Kumar Khanchandani

	Shared by: Shailesh Kumar Khanchandani

	Model type: Causal Language Model

	Language(s) (NLP): English

	Finetuned from model: Meta-Llama-3.1-8B



	# Meta-Llama-3.1-8B-Unsloth-2x-faster-finetuning-GGUF-by-skk

	This repository contains the Meta-Llama-3.1-8B-Unsloth-2x-faster-finetuning-GGUF model, optimized for faster inference.

	## Getting Started

	Use the following Python code to get started with the model:

	```python
	%%capture
	# Installs Unsloth, Xformers (Flash Attention) and all other packages!
	!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
	!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes

	from unsloth import FastLanguageModel
	import torch

	# Define the dtype you want to use
	dtype = torch.float16 # Example: using float16 for lower memory usage

	# Set load_in_4bit to True or False depending on your requirements
	load_in_4bit = True # Or False if you don't want to load in 4-bit

	# Verify the model name is correct and exists on Hugging Face Model Hub
	model_name = "skkjodhpur/Meta-Llama-3.1-8B-Unsloth-2x-faster-finetuning-GGUF-by-skk"
	# Check if the model exists, if not, you may need to adjust the model name
	!curl -s https://huggingface.co/{model_name}/resolve/main/config.json \| jq .

	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = model_name,
	max_seq_length = 2048,
	dtype = dtype,
	load_in_4bit = load_in_4bit,
	)
	FastLanguageModel.for_inference(model) # Enable native 2x faster inference

	# prompt = You MUST copy from above!

	prompt = """Below is an tools that describes a task, paired with an query that provides further context. Write a answers that appropriately completes the request.

	### tools:
	{}

	### query:
	{}

	### answers:
	{}"""

	inputs = tokenizer(
	[
	prompt.format(
	'[{"name": "live_giveaways_by_type", "description": "Retrieve live giveaways from the GamerPower API based on the specified type.", "parameters": {"type": {"description": "The type of giveaways to retrieve (e.g., game, loot, beta).", "type": "str", "default": "game"}}}]', # instruction
	"Where can I find live giveaways for beta access and games?", # input
	"", # output - leave this blank for generation!
	)
	], return_tensors = "pt").to("cuda")

	from transformers import TextStreamer
	text_streamer = TextStreamer(tokenizer)
	_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)
	```

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

	Usage
	To use the model, follow the steps outlined in the code above. This will install the necessary packages, load the model, and set up the tokenizer and inference settings.

	For any issues or questions, please open an issue in the repository.
	This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.